On Design Of Robust Deep Models For Chime-4 Multi-Channel Speech Recognition With Multiple Configurations Of Array Microphones

Yanhui Tu,Jun Du,Lei Sun,Feng Ma,Chin-Hui Lee

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION（2017）

引用 24|浏览18

暂无评分

摘要

We design a novel deep learning framework for multi-channel speech recognition in two aspects. First, for the front-end, an iterative mask estimation (IME) approach based on deep learning is presented to improve the beamforming approach based on the conventional complex Gaussian mixture model (CGMM). Second, for the back-end, deep convolutional neural networks (DCNNs), with augmentation of both noisy and beamformed training data, are adopted for acoustic modeling while the forward and backward long short-term memory recurrent neural networks (LSTM-RNNs) are used for language modeling. The proposed framework can be quite effective to multi-channel speech recognition with random combinations of fixed microphones. Testing on the CHiME-4 Challenge speech recognition task with a single set of acoustic and language models, our approach achieves the best performance of all three tracks (1-channel, 2-channel, and 6-channel) among submitted systems.

查看译文

关键词

CHiME challenge, deep learning, mask estimation, microphone array, robust speech recognition

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要