Svitchboard Ii And Fisver I: High-Quality Limited-Complexity Corpora Of Conversational English Speech

16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5(2015)

引用 27|浏览47
暂无评分
摘要
In this paper, we introduce a set of benchmark corpora of conversational English speech derived from the Switchboard-I and Fisher datasets. Traditional ASR research requires considerable computational resources and has slow experimental turnaround times. Our goal is to introduce these new datasets to researchers in the ASR and machine learning communities (especially in academia), in order to facilitate the development of novel acoustic modeling techniques on smaller but acoustically rich corpora. We select these corpora to maximize an acoustic quality criterion while limiting the vocabulary size (from 10 words up to 10,000 words) with different state-of-the-art submodular function optimization algorithms. We provide baseline word recognition results for both GMM and DNN-based systems and release the corpora definitions and Kaldi training recipes to the public.
更多
查看译文
关键词
speech recognition, acoustic modeling, submodular optimization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要