Structured Attention Network for Referring Image Segmentation

Liang Lin,Pengxiang Yan,Xiaoqian Xu,Sibei Yang,Kun Zeng,Guanbin Li

IEEE transactions on multimedia（2022）

引用 14|浏览49

暂无评分

摘要

Referring image segmentation aims at segmenting out the object or stuff referred to by a natural language expression. The challenge of this task lies in the requirement of understanding both vision and language. The linguistic structure of a referring expression can provide an intuitive and explainable layout for reasoning over visual and linguistic concepts. In this paper, we propose a structured attention network (SANet) to explore the multimodal reasoning over the dependency tree parsed from the referring expression. Specifically, SANet implements the multimodal reasoning using an attentional multimodal tree-structure recurrent module (AMTreeGRU) in a bottom-up manner. In addition, for spatial detail improvement, SANet further incorporates the semantics-guided low-level features into high-level ones using the proposed attentional skip connection module. Extensive experiments on four public benchmark datasets demonstrate the superiority of our proposed SANet with more explainable visualization examples.

查看译文

关键词

Visualization,Linguistics,Image segmentation,Cognition,Feature extraction,Semantics,Task analysis,Referring image segmentation,vision and language,cross-modal reasoning

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要