Incorporating semantic consistency for improved semi-supervised image captioning

MULTIMEDIA TOOLS AND APPLICATIONS(2023)

引用 1|浏览0
暂无评分
摘要
The high labor cost of image captioning datasets limits the application scenarios of image captioning methods. Therefore, the semi-supervised image captioning research that utilizes partially labeled datasets and a large amount of unlabeled data has gained widespread attention in recent years. The key issue of current semi-supervised image captioning research is how to obtain pseudo-labels that well match unlabeled images, providing valuable training samples for semi-supervised model training. To this end, we propose a semi-supervised image captioning method improved by incorporating semantic consistency (Semi-SC), which adopts both self-training and adversarial training for Teacher and Student models. Semi-SC constructs a semantic consistency discriminator to evaluate data of two modalities with global and local semantic similarity, which helps to filter out high-quality paired pseudo-samples from Teacher model to optimize the training of for Student model. To improve the semantic consistency between the generated captions and original images, a semantic confidence loss is designed to inject important semantic information of images into the generated captions with the global semantic content. Extensive experiments on the MSCOCO dataset and Unlabeled-COCO dataset verify the effectiveness of Semi-SC, which shows significant advantages in CIDEr and SPICE metrics, achieving 78.1 % and 15.8 % in the Scarcely-paired COCO setting and outperforming other existing semi-supervised image captioning methods.
更多
查看译文
关键词
Semi-supervised image captioning,Pseudo-label filter,Self-training,Adversarial training
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要