Bridging the Gap between Expression and Scene Text for Referring Expression Comprehension (Student Abstract).

Yuqi Bu,Jiayuan Xie,Liuwu Li,Qiong Liu,Yi Cai

AAAI Conference on Artificial Intelligence（2022）

引用 0|浏览30

暂无评分

摘要

Referring expression comprehension aims at grounding the object in an image referred to by the expression. Scene text that serves as an identifier has a natural advantage in referring to objects. However, existing methods only consider the text in the expression, but ignore the text in the image, leading to a mismatch. In this paper, we propose a novel model that can recognize the scene text. We assign the extracted scene text to its corresponding visual region and ground the target object guided by expression. Experimental results on two benchmarks demonstrate the effectiveness of our model.

查看译文

关键词

Referring Expression Comprehension,Scene Text,Multi-modal Alignment

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要