Leveraging Language ID in Multilingual End-to-End Speech Recognition

Austin Waters,Neeraj Gaur,Parisa Haghani,Pedro Moreno,Zhongdi Qu

2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019)（2019）

引用 13|浏览59

暂无评分

摘要

Recent advances in end-to-end speech recognition have made it possible to build multilingual models, capable of recognizing speech in multiple languages. Multilingual models can outperform their monolingual counterparts, depending on the amount of training data and the relatedness of languages. However, in some cases, these models rely on having perfect knowledge of the language being spoken; that is, they expect to be provided with an external language ID that augments the input features or modulates internal layers of the network. In this paper, we introduce a novel technique for inferring the language ID in a streaming fashion using RNN-T, and a novel loss function that pressures the model to identify the language after as few frames as possible. The output of this streaming language-ID model is used in training and inference of a multilingual recognition model. We show the effectiveness of our approach through experiments on two sets of languages, one consisting of different dialects of Arabic, and the other consisting of Nordic languages, Finnish and Dutch.

查看译文

关键词

end-to-end speech recognition, multilingual, RNN-T, language id

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要