MLS: A Large-Scale Multilingual Dataset for Speech Research

Vineel Pratap
Vineel Pratap
Anuroop Sriram
Anuroop Sriram

INTERSPEECH, pp. 2757-2761, 2020.

Cited by: 2|Views9
EI

Abstract:

This paper introduces Multilingual LibriSpeech (MLS) dataset, a large multilingual corpus suitable for speech research. The dataset is derived from read audiobooks from LibriVox and consists of 8 languages, including about 44.5K hours of English and a total of about 6K hours for other languages. Additionally, we provide Language Models ...More

Code:

Data:

Your rating :
0

 

Tags
Comments