Enhanced Soft Attention Mechanism With An Inception-Like Module For Image Captioning

2020 IEEE 32ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI)(2020)

引用 4|浏览15
暂无评分
摘要
Visual soft attention has been widely adopted in image captioning models. Traditional Soft Attention Mechanism (TSAM) assigns a weight to a certain region by using a multi-layer perceptron with input from its own features. As image classification networks extract regional features based on spatial locations, TSAM fails to adequately consider the spatial contexts of regions, which leads to unreasonable weight distribution. In this paper, we introduce a flexible and universal attention framework with an inception-like module, named Enhanced Soft Attention Mechanism (ESAM), which can balance the attention levels of adjacent regions and alleviate the problem caused by local features with weak representational ability. Furthermore, we add an LSTM to the attention module so that it can take into account the previous attention distribution while generating the current word. Experimental results show that our ESAM significantly surpasses the TSAM by 4.1% on BLEU-4 and 2.7% on CIDEr, and achieves better results when verifying universality under the same experimental setups.
更多
查看译文
关键词
image captioning, soft attention, inception
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要