3-D Acoustic Modeling For Far-Field Multi-Channel Speech Recognition

Anurenjan Purushothaman,Anirudh Sreeram,Sriram Ganapathy

2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING（2020）

引用 5|浏览11

暂无评分

摘要

The conventional approach to automatic speech recognition in multi-channel reverberant conditions involves a beamforming based enhancement of the multi-channel speech signal followed by a single channel neural acoustic model. In this paper, we propose to model the multi-channel signal directly using a convolutional neural network (CNN) based architecture which performs the joint acoustic modeling on the three dimensions of time, frequency and channel. The features that are input to the 3-D CNN are extracted by modeling the signal peaks in the spatio-spectral domain using a multivariate autoregressive modeling approach. This AR model is efficient in capturing the channel correlations in the frequency domain of the multi-channel signal. The experiments are conducted on the CHiME-3 and REVERB Challenge dataset using multi-channel reverberant speech. In these experiments, the proposed 3-D feature and acoustic modeling approach provides significant improvements over an ASR system trained with beamformed audio (average relative improvements of 16% and 6% in word error rates for CHiME-3 and REVERB Challenge datasets respectively).

查看译文

关键词

Spatio-spectral Autoregressive Modeling (SSAR), Multi-channel signal processing, Beamforming, 3-D CNN modeling, Automatic Speech Recognition

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要