Novel Unsupervised Auditory Filterbank Learning Using Convolutional RBM for Speech Recognition.
IEEE/ACM Trans. Audio, Speech & Language Processing(2016)
摘要
To learn auditory filterbanks, recently, we have proposed an unsupervised learning model based on convolutional restricted Boltzmann machine RBM with rectified linear units. In this paper, theory, training algorithm of our proposed model, and detailed analysis of learned filterbank are being presented. Learning of the model with different databases shows that the model is able to learn cochlear-like impulse responses that are localized in frequency-domain. An auditory-like scale obtained from filterbanks learned from clean and noisy datasets resembles the Mel scale, which is known to mimic perceptually relevant aspect of speech. We have experimented with both cepstral denoted as ConvRBM-CC as well as filterbank features denoted as ConvRBM-BANK. On large vocabulary continuous speech recognition task, we achieved relative improvement of 7.21-17.8% in word error rate WER compared to Mel frequency cepstral coefficient MFCC features and 1.35-6.82% compared to Mel filterbank FBANK features. On AURORA 4 multicondition training database, the relative improvement in WER by 4.8-13.65% was achieved using a Hybrid Deep Neural Network-Hidden Markov Model DNN-HMM system with ConvRBM-CC features. Using ConvRBM-BANK features, we achieve absolute reduction of 1.25-3.85% in WER on AURORA 4 test sets compared to FBANK features. A context-dependent DNN-HMM system further improves performance with a relative improvement of 3.6-4.6% on an average for bigram 5k and tri-gram 5k language models. Hence, our proposed learned filterbank performs better than traditional MFCC and Mel-filterbank features for both clean and multicondition automatic speech recognition ASR tasks. A system combination of ConvRBM-BANK and FBANK features further improve performance in all ASR tasks. Cross-domain experiments where subband filters trained on one database are used for the ASR task of another database show that model learns generalized representations of speech signals.
更多查看译文
关键词
Speech processing,Hidden Markov models,Convolution,Databases,Mel frequency cepstral coefficient,Mathematical model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络