MLS: A Large-Scale Multilingual Dataset for Speech Research
INTERSPEECH, pp. 2757-2761, 2020.
This paper introduces Multilingual LibriSpeech (MLS) dataset, a large multilingual corpus suitable for speech research. The dataset is derived from read audiobooks from LibriVox and consists of 8 languages, including about 44.5K hours of English and a total of about 6K hours for other languages. Additionally, we provide Language Models ...More
PPT (Upload PPT)