Learning Filterbanks from Raw Speech for Phone Recognition

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2018)

引用 123|浏览241
暂无评分
摘要
We train a bank of complex filters that operates on the raw waveform and is fed into a convolutional neural network for end-to-end phone recognition. These time-domain filterbanks (TD-filterbanks) are initialized as an approximation of mel-filterbanks, and then fine-tuned jointly with the remaining convolutional architecture. We perform phone recognition experiments on TIMIT and show that for several architectures, models trained on TD-filterbanks consistently outperform their counterparts trained on comparable mel-filterbanks. We get our best performance by learning all front-end steps, from pre-emphasis up to averaging. Finally, we observe that the filters at convergence have an asymmetric impulse response, and that some of them remain almost analytic.
更多
查看译文
关键词
phone recognition experiments,front-end steps,raw speech,complex filters,raw waveform,convolutional neural network,end-to-end phone recognition,time-domain filter banks,mel-filter bank approximation,TIMIT,asymmetric impulse response,TD-filter banks,convolutional architecture
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要