Improving the Quality of Video-to-Language Models by Optimizing Annotation of the Training Material.
Lecture Notes in Computer Science(2018)
摘要
Automatic video captioning is one of the ultimate challenges of Natural Language Processing, boosted by the omnipresence of video and the release of large-scale annotated video benchmarks. However, the specificity and quality of the captions vary considerably, having an adverse effect on the quality of the trained captioning models. In this work, we address this issue by proposing automatic strategies for optimizing the annotations of video material, removing annotations that are not semantically relevant and generating new and more informative captions. We evaluate our approach on the MSR-VTT challenge with a state-of-the-art deep learning video-to-language model. Our code is available at https://github.com/lpmayos/mcv_thesis.
更多查看译文
关键词
Video-to-language,Video captioning,Video understanding,Text annotation optimization,Semantic sentence similarity
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络