Unsupervised Regularization-Based Adaptive Training for Speech Recognition.

Fenglin Ding,Wu Guo,Bin Gu,Zhen-Hua Ling,Jun Du

INTERSPEECH（2020）

引用 0|浏览38

暂无评分

摘要

In this paper, we propose two novel regularization-based speaker adaptive training approaches for connectionist temporal classification (CTC) based speech recognition. The first method is center loss (CL) regularization, which is used to penalize the distances between the embeddings of different speakers and the only center. The second method is speaker variance loss (SVL) regularization in which we directly minimize the speaker interclass variance during model training. Both methods achieve the purpose of training an adaptive model on the fly by adding regularization terms to the training loss function. Our experiment on the AISHELL-1 Mandarin recognition task shows that both methods are effective at adapting the CTC model without requiring any specific fine-tuning or additional complexity, achieving character error rate improvements of up to 8.1% and 8.6% over the speaker independent (SI) model, respectively.

查看译文

关键词

speaker adaptive training, regularization, speech recognition, connectionist temporal classification

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要