MSVD-Turkish: A Large-Scale Dataset for Video Captioning in Turkish

Begüm Çtamak,Menekşe Kuyu,Aykut Erdem,Erkut Erdem

2019 27th Signal Processing and Communications Applications Conference (SIU)（2019）

引用 6|浏览43

暂无评分

摘要

Automatically generating natural language descriptions for videos, aka video captioning, has been recently introduced as a challenging integrated vision and language problem. Although researchers have demonstrated numerous solutions for English, to date there has been no study on Turkish language due to the lack of suitable datasets to train Turkish video captioning models. To tackle this, in this study we construct a largescale Turkish benchmark dataset by carefully translating English descriptions from MSVD dataset to Turkish. Moreover, we implement several neural models, including LSTM-based sequence-to-sequence architectures with temporal attention mechanisms, and report the performances of these strong baselines on our dataset. We hope that our dataset will serve as a good resource for future efforts on Turkish video captioning.

查看译文

关键词

Video captioning,computer vision,natural language processing,machine learning

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要