Improving Speech Enhancement with Phonetic Embedding Features

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)(2019)

引用 2|浏览26
暂无评分
摘要
In this paper, we present a speech enhancement framework that leverages phonetic information obtained from the acoustic model. It consists of two separate components: (i) a long short-term memory recurrent neural network (LSTM-RNN) based speech enhancement model that takes the combination of log-power spectra (LPS) and phonetic embedding features as input to predict the complex ideal ratio mask (cIRM); and (ii) a convolutional, long short-term memory and fully connected deep neural network (CLDNN) based acoustic model that extracts the phonetic feature vector in the hidden units of its LSTM layer. Our experimental results show that the proposed framework outperforms both the conventional and phoneme-dependent speech enhancement systems under various noisy conditions, generalizes well to unseen conditions, and performs robustly to the speech interference. We further demonstrate its superior enhancement performance on unvoiced speech and report a preliminary yet promising recognition experiment on real test data.
更多
查看译文
关键词
Speech enhancement,acoustic model,phonetic embedding feature,unvoiced speech
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要