LSTM-convolutional-BLSTM Encoder-Decoder Network for Minimum Mean-Square Error Approach to Speech Enhancement

Zeyu Wang,Tao Zhang,Yangyang Shao,Biyun Ding

Applied acoustics（2021）

引用 25|浏览3

暂无评分

摘要

In recent years, deep learning models have been employed for speech enhancement. Most of the existing methods based on deep learning use fully Convolutional Neural Network (CNN) to capture time–frequency information of input features. Compared with CNNs, it is more reasonable to use Long Short-Term Memory (LSTM) network to capture contextual information on the time axis of features. However, the computation load of a fully LSTM structure is heavy. To balance the model complexity and the capability of capturing time–frequency features, we present an LSTM-Convolutional-BLSTM Encoder-Decoder (LCLED) network for speech enhancement. The LCLED additionally incorporates transpose convolution and skip connection. The key idea is that we use two LSTM parts and convolutional layers to model the contextual information and frequency dimension features, respectively. Furthermore, in order to achieve a higher quality of enhanced speech, a priori Signal-to-Noise Ratio (SNR) is applied as the learning target of LCLED. The Minimum Mean-Square Error (MMSE) approach is used for postprocessing. The results indicate that the proposed LCLED not only reduces the model complexity and training time but also improves the quality and the intelligibility of enhanced speech compared with the fully LSTM structure.

查看译文

关键词

Speech enhancement,LSTM-Convolutional-BLSTM Encoder-Decoder network,Transpose convolution,Minimum Mean-Square Error

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要