Non-linear dimension reduction of Gabor features for noise-robust ASR

Acoustics, Speech and Signal Processing(2014)

引用 6|浏览9
暂无评分
摘要
It has been shown that Gabor filters closely resemble the spectro-temporal response fields of neurons in the primary auditory cortex. A filter bank of 2-D Gabor filters can be applied to either the mel-spectrogram or power normalized spectrogram to obtain a set of physiologically inspired Gabor Filter Bank Features. The high dimensionality and the correlated nature of these features pose an issue for ASR. In the past, dimension reduction was performed through (1) feature selection, (2) channel selection, (3) linear dimension reduction or (4) tandem acoustic modelling. In this paper, we propose a novel solution to this issue based on channel selection and non-linear dimension reduction using Laplacian Eigenmaps. These features are concatenated with Power Normalized Cepstral Coefficients (PNCC) to evaluate if the two are complementary and provide an improvement in performance. We show a relative reduction of 12.66% in the WER compared to the PNCC baseline, when applied to the Aurora 4 database.
更多
查看译文
关键词
Gabor filters,channel bank filters,eigenvalues and eigenfunctions,feature extraction,feature selection,speech recognition,2D Gabor filters,Aurora 4 database,Gabor filter bank features,Laplacian eigenmaps,PNCC baseline,WER,automatic speech recognition,channel selection,feature selection,mel-spectrogram,noise-robust ASR,nonlinear dimension reduction,power normalized cepstral coefficients,power normalized spectrogram,primary auditory cortex,spectrotemporal response fields,tandem acoustic modelling,word error rate,Gabor filter-bank,Laplacian Eigenmaps,Multi-layer perceptron
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要