谷歌浏览器插件
订阅小程序
在清言上使用

A Comparative Study of Pre-trained CNNs and GRU-Based Attention for Image Caption Generation

CoRR(2023)

引用 0|浏览24
暂无评分
摘要
Image captioning is a challenging task involving generating a textual description for an image using computer vision and natural language processing techniques. This paper proposes a deep neural framework for image caption generation using a GRU-based attention mechanism. Our approach employs multiple pre-trained convolutional neural networks as the encoder to extract features from the image and a GRU-based language model as the decoder to generate descriptive sentences. To improve performance, we integrate the Bahdanau attention model with the GRU decoder to enable learning to focus on specific image parts. We evaluate our approach using the MSCOCO and Flickr30k datasets and show that it achieves competitive scores compared to state-of-the-art methods. Our proposed framework can bridge the gap between computer vision and natural language and can be extended to specific domains.
更多
查看译文
关键词
Image captioning,Attention mechanism,Inception V3,Convolutional Neural Network,GRU
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要