Correlated Multi-Level Speech Enhancement for Robust Real-World ASR Applications Using Mask-Waveform-Feature Optimization

Hang Chen,Jun Du, Zhe Wang, Chenxi Wang, Yuling Ren, Qinglong Li, Ruibo Liu,Chin-Hui Lee

2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC(2023)

引用 0|浏览8
暂无评分
摘要
Our proposed correlated multi-level optimization approach enhances speech recognition performance for high-performance acoustic models in real-world applications. By combining mean squared error of mask, scale-invariant source-to-noise ratio, and cross-entropy loss functions, as well as adopting Pearson correlation coefficient as a part of the optimization goal to measure the correlation between them, our approach aims to not only reduce the value of each loss during training but also increase the correlation between them. Experimental results on continuous Mandarin recognition in mobile phone scenarios show that our approach achieves a relative reduction of about 25.29% in the average character error rate across five signal-to-noise ratio levels. Notably, our approach improves objective perception qualities and intelligibility measures, as well as recognition accuracies, surpassing some advanced speech enhancement techniques in the context of automatic speech recognition.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要