YODAS: Youtube-Oriented Dataset for Audio and Speech

Xinjian Li,Shinnosuke Takamichi,Takaaki Saeki,William Chen,Sayaka Shiota,Shinji Watanabe

Automatic Speech Recognition & Understanding（2024）

引用 0|浏览7

暂无评分

摘要

In this study, we introduce YODAS (YouTube-Oriented Dataset for Audio and Speech), a large-scale, multilingual dataset comprising currently over 500k hours of speech data in more than 100 languages, sourced from both labeled and unlabeled YouTube speech datasets. The labeled subsets, including manual or automatic subtitles, facilitate supervised model training. Conversely, the unlabeled subsets are apt for self-supervised learning applications. YODAS is distinctive as the first publicly available dataset of its scale, and it is distributed under a Creative Commons license. We introduce the collection methodology utilized for YODAS, which contributes to the large-scale speech dataset construction. Subsequently, we provide a comprehensive analysis of speech, text contained within the dataset. Finally, we describe the speech recognition baselines over the top-15 languages.

查看译文

关键词

multilingual speech processing,speech recognition,large-scale speech dataset

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要