A Novel Approach To Soft-Mask Estimation And Log-Spectral Enhancement For Robust Speech Recognition
ICASSP(2012)
摘要
This paper describes a technique for enhancing the Mel-filtered log spectra of noisy speech, with application to noise robust speech recognition. We first compute an SNR-based soft-decision mask in the Mel-spectral domain as an indicator of speech presence. Then, we exploit the known time-frequency correlation of speech by treating this mask as an image, and performing median filtering and blurring to remove the outliers and to smooth the decision regions. This mask constitutes a set of multiplicative coefficients (ranging in [0,1]) that are used to discard the unreliable parts of the Mel-filtered log-spectrum of noisy speech. Finally, we apply Log-Spectral Flooring [1] on the liftered spectra of both clean and noisy speech so as to match their respective dynamic ranges and to emphasize the information in the spectral peaks. The noisy MFCCs computed on these modified log-spectra show an increased similarity with their corresponding clean MFCCs. Evaluation on the Aurora-2 corpus shows that the proposed approach competes with state-of-the-art front-ends, like ETSI-AFE, MVA or PNCC.
更多查看译文
关键词
Speech Recognition,Feature Extraction,Speech Enhancement,Mask Estimation,Median Filtering
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络