谷歌浏览器插件
订阅小程序
在清言上使用

Captioning transformer with scene graph guiding

ICIP(2021)

引用 9|浏览15
暂无评分
摘要
Image captioning is a challenging task which aims to generate descriptions of images. Most existing approaches adopt the encoder-decoder architecture, where encoder takes the image as input and decoder predicts corresponding word sequence. However, a common defect of these methods is that the abundant semantic relationships between relevant regions are ignored, leading the decoder to give a misled caption. To alleviate this issue, we propose a novel model, which utilizes sufficient semantic relationships provided by scene graph to guide the word generation process. To some extent, the scene graph narrows the semantic gap between images and descriptions, and hence improves the quality of generated sentences. Extensive experimental results demonstrate that our model achieves superior performance on various quantitative metrics.
更多
查看译文
关键词
Image captioning,Scene graph,Attention,Semantic relationship,Deep Neural Network
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要