Improving Image Paragraph Captioning with Dual Relations

Yun Liu,Yihui Shi,Fangxiang Feng,Ruifan Li,Zhanyu Ma,Xiaojie Wang

IEEE International Conference on Multimedia and Expo (ICME)（2022）

引用 2|浏览87

暂无评分

摘要

Image paragraph captioning aims to generate multiple de-scriptive sentences for an image. However, most previous methods ignore the explicit relations among objects resulting in unsatisfactory performance. In this paper, we propose a novel model (i.e., DualRel) to capture spatial and seman-tic relations among objects. Specifically, the spatial relation embedding is obtained solely from images using a predefined geometry pattern. With the help of captions, the semantic relation embedding is learned in a weakly supervised man-ner. These two relation embeddings are then interacted with regional features of objects through a relation-aware attention interaction. It first obtains a visual context vector using regional features. Then with the visual context vector, we obtain the corresponding spatial and semantic relation-aware vectors using attentions. These three vectors are fused with two gates for language decoding to further generate a para-graph. Experimental results on Stanford benchmark dataset show that DualRel achieves remarkable improvements ¹ ¹ Code released at https://github.com/fuyunll07/DualRel.

查看译文

关键词

Image paragraph captioning,relation em-beddings,relation-aware attention

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要