谷歌浏览器插件
订阅小程序
在清言上使用

Structured Attention Network for Referring Image Segmentation

IEEE transactions on multimedia(2022)

引用 14|浏览49
暂无评分
摘要
Referring image segmentation aims at segmenting out the object or stuff referred to by a natural language expression. The challenge of this task lies in the requirement of understanding both vision and language. The linguistic structure of a referring expression can provide an intuitive and explainable layout for reasoning over visual and linguistic concepts. In this paper, we propose a structured attention network (SANet) to explore the multimodal reasoning over the dependency tree parsed from the referring expression. Specifically, SANet implements the multimodal reasoning using an attentional multimodal tree-structure recurrent module (AMTreeGRU) in a bottom-up manner. In addition, for spatial detail improvement, SANet further incorporates the semantics-guided low-level features into high-level ones using the proposed attentional skip connection module. Extensive experiments on four public benchmark datasets demonstrate the superiority of our proposed SANet with more explainable visualization examples.
更多
查看译文
关键词
Visualization,Linguistics,Image segmentation,Cognition,Feature extraction,Semantics,Task analysis,Referring image segmentation,vision and language,cross-modal reasoning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要