BembaSpeech: A Speech Recognition Corpus for the Bemba Language.

International Conference on Language Resources and Evaluation (LREC)(2022)

引用 0|浏览12
暂无评分
摘要
We present a preprocessed, ready-to-use automatic speech recognition corpus, BembaSpeech, consisting over 24 hours of read speech in the Bemba language, a written but low-resourced language spoken by over 30% of the population in Zambia. To assess its usefulness for training and testing ASR systems for Bemba, we train an end-to-end Bemba ASR system by fine-tuning a pre-trained DeepSpeech English model on the training portion of the BembaSpeech corpus. Our best model achieves a word error rate (WER) of 54.78%. The results show that the corpus can be used for building ASR systems for Bemba. The corpus and models are publicly released at https://github.com/csikasote/BembaSpeech.
更多
查看译文
关键词
Automatic Speech Recognition, ASR corpus, Low-resource language, Bemba language, Zambia
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要