A Multi-Accent Acoustic Model Using Mixture of Experts for Speech Recognition.

Abhinav Jain,Vishwanath P. Singh,Shakti P. Rath

Interspeech（2019）

引用 30|浏览3

暂无评分

摘要

A major challenge in Automatic Speech Recognition(ASR) systems is to handle speech from a diverse set of accents. A model trained using a single accent performs rather poorly when confronted with different accents. One of the solutions is a multicondition model trained on all the accents. However the performance improvement in this approach might be rather limited. Otherwise, accent-specific models might be trained but they become impractical as number of accents increases. In this paper, we propose a novel acoustic model architecture based on Mixture of Experts (MoE) which works well on multiple accents without having the overhead of training separate models for separate accents. The work is based on our earlier work, termed as MixNet, where we showed performance improvement by separation of phonetic class distributions in the feature space. In this paper, we propose an architecture that helps to compensate phonetic and accent variabilities which helps in even better discrimination among the classes. These variabilities are learned in a joint frame-work, and produce consistent improvements over all the individual accents, amounting to an overall 18% relative improvement in accuracy compared to baseline trained in multi-condition style.

查看译文

关键词

multi-accent acoustic model,mixture of experts,deep learning,automatic speech recognition

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要