Unsupervised Beamforming Based On Multichannel Nonnegative Matrix Factorization For Noisy Speech Recognition
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)(2018)
摘要
This paper presents unsupervised multichannel speech enhancement for noisy speech recognition. Time-frequency (TF) mask estimation has actively been studied for estimating the steering vectors and spatial covariance matrices of speech and noise used for beamforming. The state-of-the-art approach to mask estimation is to use deep neural networks (DNNs) for classifying the TF bins of observed signals into speech and noise. Such a supervised approach, however, does not work well in an unknown environment. To accurately estimate the spatial covariance matrices in an unsupervised manner, we perform blind source separation (BSS) based on multichannel nonnegative matrix factorization (MNMF) for decomposing each TF bin into the components of speech and the other sources (noise). To clarify a suitable type of beamforming for MNMF, we tested both time-invariant and time-varying versions of the minimum variance distortionless response (MVDR) beamforming in addition to standard multichannel Wiener filtering (MWF). The experimental results showed that our MNMF-based beamforming approach outperformed the state-of-the-art DNN-based beamforming method in unknown environments that do not match the training data.
更多查看译文
关键词
Noisy speech recognition, speech enhancement, multichannel nonnegative matrix factorization, beamforming
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络