KALAKA-3: a database for the assessment of spoken language recognition technology on YouTube audios

Language Resources and Evaluation(2015)

引用 12|浏览49
暂无评分
摘要
KALAKA-3 is a speech database specifically designed for the development and evaluation of Spoken Language Recognition (SLR) systems. The database provides TV broadcast speech for training, and audio data extracted from YouTube videos for tuning and testing. The database was created to support the Albayzin 2012 Language Recognition Evaluation (LRE), which featured two language recognition tasks, both dealing with European languages. The first one involved six target languages (Basque, Catalan, English, Galician, Portuguese and Spanish) for which there was plenty of training data, whereas the second one involved four target languages (French, German, Greek and Italian) for which no training data was provided. This second task tried to simulate the use case of low resource languages. Two separate sets of YouTube audio files were provided to test the performance of language recognition systems on both tasks. To allow open-set tests, these datasets included speech in 11 additional (Out-Of-Set) European languages. In this paper, we first discuss the design issues considered when creating the database and describe the data collection procedure. Then, we present the results attained in the Albayzin 2012 LRE, along with the performance of state-of-the-art systems on the four evaluation tracks defined on the database. Both series of results demonstrate the usefulness of KALAKA-3 as a challenging benchmark for the advancement of SLR technology. As far as we know, this is the first database specifically designed to benchmark SLR technology on YouTube audios.
更多
查看译文
关键词
Spoken language recognition,YouTube audio,Broadcast speech,European languages,Low-resource languages
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要