Image captioning based on scene graphs: A survey

SSRN Electronic Journal(2023)

引用 1|浏览8
暂无评分
摘要
Although recent developments in deep learning have brought several tasks closer to human performance, there is still a significant gap between human and machine performance in certain image captioning tasks. Image captioning is the process of creating a textual description of an image. Image captioning focuses on recognizing the main regions of an image, their attributes, and their relationships. It aims to generate textual descriptions that are syntactically and semantically correct. For simple image descriptions, deep learning-based techniques perform well in terms of intricacies and constraints. However, it is challenging to construct sentences when faced with complicated scenes with many entities and relationships, such as how to concurrently solve diversity, anchoring, and controllability-a seemingly simple ability for humans. Scene graphs can significantly alleviate this problem by fully mining spatial and semantic information. However, despite these promising findings, they are fragmented and do not form a systematic comparative overview. We provide a comprehensive overview of the available scene-graph-based image captioning methods in this survey. The foundations of these techniques are discussed to examine their performance, strengths, and constraints. Furthermore, we discuss the comparisons of the state-of-the-art methods, datasets, and commonly utilized evaluation measures. Finally, we conclude the survey with an in-depth discussion of the present and future research challenges. This study will assist readers in comprehending how scene graphs can be applied to image captioning.
更多
查看译文
关键词
Deep learning,Image captioning,Scene graph,Spatial and semantic information
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要