Unpaired Image Captioning by Image-Level Weakly-Supervised Visual Concept Recognition

IEEE TRANSACTIONS ON MULTIMEDIA(2023)

引用 4|浏览30
暂无评分
摘要
The goal of unpaired image captioning (UIC) is to describe images without using image-caption pairs in the training phase. Although challenging, we expect the task can be accomplished by leveraging images aligned with visual concepts. Most existing studies use off-the-shelf algorithms to obtain the visual concepts because the Bounding Box (BBox) labels or relationship-triplet labels used for training are expensive to acquire. To avoid exhaustive annotations, we propose a novel approach to achieve cost-effective UIC. Specifically, we adopt image-level labels to optimize the UIC model in a weakly-supervised manner. For each image, we assume that only the image-level labels are available without specific locations and numbers. The image-level labels are utilized to train a weakly-supervised object recognition model to extract object information (e.g., instance), and the extracted instances are adopted to infer the relationships among different objects using an enhanced graph neural network (GNN). The proposed approach achieves comparable or even better performance compared with previous methods without expensive annotations. Furthermore, we design an unrecognized object (UnO) loss to improve the alignment of the inferred object and relationship information with the images. It can effectively alleviate the issue encountered by existing UIC models when generating sentences with nonexistent objects. To the best of our knowledge, this is the first attempt to address the problem of Weakly-Supervised visual concept recognition for UIC (WS-UIC) based only on image-level labels. Extensive experiments demonstrate that the proposed method achieves inspiring results on the COCO dataset while significantly reducing the labeling cost.
更多
查看译文
关键词
Visualization,Image recognition,Task analysis,Object detection,Training,Annotations,Data models,Graph neural network,unpaired image captioning,weakly-supervised instance segmentation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要