MSVD-Turkish: A Large-Scale Dataset for Video Captioning in Turkish
2019 27th Signal Processing and Communications Applications Conference (SIU)(2019)
摘要
Automatically generating natural language descriptions for videos, aka video captioning, has been recently introduced as a challenging integrated vision and language problem. Although researchers have demonstrated numerous solutions for English, to date there has been no study on Turkish language due to the lack of suitable datasets to train Turkish video captioning models. To tackle this, in this study we construct a largescale Turkish benchmark dataset by carefully translating English descriptions from MSVD dataset to Turkish. Moreover, we implement several neural models, including LSTM-based sequence-to-sequence architectures with temporal attention mechanisms, and report the performances of these strong baselines on our dataset. We hope that our dataset will serve as a good resource for future efforts on Turkish video captioning.
更多查看译文
关键词
Video captioning,computer vision,natural language processing,machine learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要