Incorporating semantic consistency for improved semi-supervised image captioning

MULTIMEDIA TOOLS AND APPLICATIONS（2023）

引用 1|浏览0

暂无评分

摘要

The high labor cost of image captioning datasets limits the application scenarios of image captioning methods. Therefore, the semi-supervised image captioning research that utilizes partially labeled datasets and a large amount of unlabeled data has gained widespread attention in recent years. The key issue of current semi-supervised image captioning research is how to obtain pseudo-labels that well match unlabeled images, providing valuable training samples for semi-supervised model training. To this end, we propose a semi-supervised image captioning method improved by incorporating semantic consistency (Semi-SC), which adopts both self-training and adversarial training for Teacher and Student models. Semi-SC constructs a semantic consistency discriminator to evaluate data of two modalities with global and local semantic similarity, which helps to filter out high-quality paired pseudo-samples from Teacher model to optimize the training of for Student model. To improve the semantic consistency between the generated captions and original images, a semantic confidence loss is designed to inject important semantic information of images into the generated captions with the global semantic content. Extensive experiments on the MSCOCO dataset and Unlabeled-COCO dataset verify the effectiveness of Semi-SC, which shows significant advantages in CIDEr and SPICE metrics, achieving 78.1 % and 15.8 % in the Scarcely-paired COCO setting and outperforming other existing semi-supervised image captioning methods.

查看译文

关键词

Semi-supervised image captioning,Pseudo-label filter,Self-training,Adversarial training

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要