谷歌浏览器插件
订阅小程序
在清言上使用

Visual Relational Reasoning for Image Caption.

IEEE International Joint Conference on Neural Network(2020)

引用 3|浏览17
暂无评分
摘要
Recently, various attention-based networks have achieved state-of-art results on image captioning tasks. However, this simple mechanism is insufficient to modelling and reasoning the relationships between the visual regions required for scene understanding. In this research, we propose a visual relational reasoning module to implicit learning semantic and spatial relationships between pairs of relevant visual objects and infers the feature output that is most relevant to the currently generated word. Furthermore, a context gate is introduced to dynamically control the contribution of visual region attention modules and visual relational reasoning module which allows predicting different words according to different type of features (visual or visual relationship). We evaluate our model on the MSCOCO dataset and achieved state-of-the-art results. Qualitative analysis shows that our visual relational reasoning model can dynamically model and reason the most relevant features of different types of generated words and improve the quality of the caption.
更多
查看译文
关键词
Visualization,Cognition,Semantics,Decoding,Feature extraction,Logic gates,Context modeling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要