Improving Speech Enhancement with Phonetic Embedding Features

Bo Wu,Meng Yu,Lianwu Chen,Mingjie Jin,Dan Su,Dong Yu

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)（2019）

引用 2|浏览26

暂无评分

摘要

In this paper, we present a speech enhancement framework that leverages phonetic information obtained from the acoustic model. It consists of two separate components: (i) a long short-term memory recurrent neural network (LSTM-RNN) based speech enhancement model that takes the combination of log-power spectra (LPS) and phonetic embedding features as input to predict the complex ideal ratio mask (cIRM); and (ii) a convolutional, long short-term memory and fully connected deep neural network (CLDNN) based acoustic model that extracts the phonetic feature vector in the hidden units of its LSTM layer. Our experimental results show that the proposed framework outperforms both the conventional and phoneme-dependent speech enhancement systems under various noisy conditions, generalizes well to unseen conditions, and performs robustly to the speech interference. We further demonstrate its superior enhancement performance on unvoiced speech and report a preliminary yet promising recognition experiment on real test data.

查看译文

关键词

Speech enhancement,acoustic model,phonetic embedding feature,unvoiced speech

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要