Multilayer Dense Attention Model for Image Caption.

Ke Wang,Xun Zhang,Fan Wang,Tsu-Yang Wu, Chien-Ming Chen

IEEE ACCESS（2019）

引用 27|浏览9

暂无评分

摘要

The image caption is a technology that enables us to understand the contents and generate descriptive text, of images using machines. With the development of deep learning, means of using it to understand image content and generate descriptive text has become a hot research topic. This paper proposes a multilayer dense attention model for image caption. A faster recurrent convolutional neural networks (Faster R-CNN) is employed to extract image features as the coding layer, the long short-term memory (LSTM)-attend is used to decode the multilayer dense attention model, and the description text is generated. The model parameters are optimized using strategy gradient optimization in reinforcement learning. Use of dense attention mechanisms in the coding layer can effectively avoid the interference of non-salient information and selectively output the corresponding description text for the decoding process. The experimental results in the field of general images validate the model's good ability to understand images and generating text.

查看译文

关键词

Attention,image caption,LSTM,RCNN

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要