Improving Image Paragraph Captioning with Dual Relations

IEEE International Conference on Multimedia and Expo (ICME)(2022)

引用 2|浏览87
暂无评分
摘要
Image paragraph captioning aims to generate multiple de-scriptive sentences for an image. However, most previous methods ignore the explicit relations among objects resulting in unsatisfactory performance. In this paper, we propose a novel model (i.e., DualRel) to capture spatial and seman-tic relations among objects. Specifically, the spatial relation embedding is obtained solely from images using a predefined geometry pattern. With the help of captions, the semantic relation embedding is learned in a weakly supervised man-ner. These two relation embeddings are then interacted with regional features of objects through a relation-aware attention interaction. It first obtains a visual context vector using regional features. Then with the visual context vector, we obtain the corresponding spatial and semantic relation-aware vectors using attentions. These three vectors are fused with two gates for language decoding to further generate a para-graph. Experimental results on Stanford benchmark dataset show that DualRel achieves remarkable improvements 1 1 Code released at https://github.com/fuyunll07/DualRel.
更多
查看译文
关键词
Image paragraph captioning,relation em-beddings,relation-aware attention
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要