A reweighting method for speech recognition with imbalanced data of Mandarin and sub-dialects

Jiaju Wu, Zhengchang Wen, Haitian Huang, Hanjing Su,Fei Liu, Huan Wang, Yi Ding,Qingyao Wu

Service Oriented Computing and Applications(2024)

引用 0|浏览1
暂无评分
摘要
Automatic speech recognition (ASR) is an important technology in many fields like video-sharing services, online education and live broadcast. Most recent ASR methods are based on deep learning technology. A dataset containing training samples of standard Mandarin and its sub-dialects can be used to train a neural network-based ASR model that can recognize standard Mandarin and its sub-dialects. Usually, due to different costs of collecting different sub-dialects, the number of training samples of standard Mandarin in the dataset is much larger than the number of training samples of sub-dialects, resulting in the recognition performance of the model for standard Mandarin being much higher than that of sub-dialects. In this paper, to enhance the recognition performance for sub-dialects, we propose to reweight the recognition loss for different sub-dialects based on their similarity to standard Mandarin. The proposed reweighting method makes the model pay more attention to sub-dialects with larger loss weights, alleviating the problem of poor recognition performance for sub-dialects. Our model was trained and validated on an open-source dataset named KeSpeech, including standard Mandarin and its eight sub-dialects. Experimental results show that the proposed model is better at recognizing most sub-dialects than the baseline and is about 0.5 lower than the baseline in Character Error Rate.
更多
查看译文
关键词
Automatic speech recognition,Imbalanced data,Dialect recognition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要