Correlated Multi-Level Speech Enhancement for Robust Real-World ASR Applications Using Mask-Waveform-Feature Optimization
2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC(2023)
摘要
Our proposed correlated multi-level optimization approach enhances speech recognition performance for high-performance acoustic models in real-world applications. By combining mean squared error of mask, scale-invariant source-to-noise ratio, and cross-entropy loss functions, as well as adopting Pearson correlation coefficient as a part of the optimization goal to measure the correlation between them, our approach aims to not only reduce the value of each loss during training but also increase the correlation between them. Experimental results on continuous Mandarin recognition in mobile phone scenarios show that our approach achieves a relative reduction of about 25.29% in the average character error rate across five signal-to-noise ratio levels. Notably, our approach improves objective perception qualities and intelligibility measures, as well as recognition accuracies, surpassing some advanced speech enhancement techniques in the context of automatic speech recognition.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要