Learning Filterbanks from Raw Speech for Phone Recognition

Neil Zeghidour,Nicolas Usunier,Iasonas Kokkinos,Thomas Schatz,Gabriel Synnaeve,Emmanuel Dupoux

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)（2018）

引用 123|浏览241

暂无评分

摘要

We train a bank of complex filters that operates on the raw waveform and is fed into a convolutional neural network for end-to-end phone recognition. These time-domain filterbanks (TD-filterbanks) are initialized as an approximation of mel-filterbanks, and then fine-tuned jointly with the remaining convolutional architecture. We perform phone recognition experiments on TIMIT and show that for several architectures, models trained on TD-filterbanks consistently outperform their counterparts trained on comparable mel-filterbanks. We get our best performance by learning all front-end steps, from pre-emphasis up to averaging. Finally, we observe that the filters at convergence have an asymmetric impulse response, and that some of them remain almost analytic.

查看译文

关键词

phone recognition experiments,front-end steps,raw speech,complex filters,raw waveform,convolutional neural network,end-to-end phone recognition,time-domain filter banks,mel-filter bank approximation,TIMIT,asymmetric impulse response,TD-filter banks,convolutional architecture

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要