An Ensemble of Generation- and Retrieval-Based Image Captioning With Dual Generator Generative Adversarial Network

IEEE TRANSACTIONS ON IMAGE PROCESSING(2020)

引用 39|浏览153
暂无评分
摘要
Image captioning, which aims to generate a sentence to describe the key content of a query image, is an important but challenging task. Existing image captioning approaches can be categorised into two types: generation-based methods and retrieval-based methods. Retrieval-based methods describe images by retrieving pre-existing captions from a repository. Generation-based methods synthesize a new sentence that verbalizes the query image. Both ways have certain advantages but suffer from their own disadvantages. In the paper, we propose a novel EnsCaption model, which aims at enhancing an ensemble of retrieval-based and generation-based image captioning methods through a novel dual generator generative adversarial network. Specifically, EnsCaption is composed of a caption generation model that synthesizes tailored captions for the query image, a caption re-ranking model that retrieves the best-matching caption from a candidate caption pool consisting of generated captions and pre-retrieved captions, and a discriminator that learns the multi-level difference between the generated/retrieved captions and the ground-truth captions. During the adversarial training process, the caption generation model and the caption re-ranking model provide improved synthetic and retrieved candidate captions with high ranking scores from the discriminator, while the discriminator based on multi-level ranking is trained to assign low ranking scores to the generated and retrieved image captions. Our model absorbs the merits of both generation-based and retrieval-based approaches. We conduct comprehensive experiments to evaluate the performance of EnsCaption on two benchmark datasets: MSCOCO and Flickr-30K. Experimental results show that EnsCaption achieves impressive performance compared to the strong baseline methods.
更多
查看译文
关键词
Generators, Decoding, Generative adversarial networks, Training, Computational modeling, Task analysis, Image captioning, ensemble generation-retrieval model, adversarial learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要