Dual branch deep interactive UNet for monaural noisy-reverberant speech enhancement

APPLIED ACOUSTICS(2023)

引用 0|浏览5
暂无评分
摘要
Noise and reverberation can severely degrade speech quality and intelligibility, so many deep neural network-based noisy-reverberant speech enhancement methods have been proposed, among which classic methods include spectral masking and spectral mapping. Spectrum masking and spectrum mapping have their advantages and disadvantages in different noise environments, and they are complementary. This paper proposes a dual branch deep interactive UNet (DBDIUNet) for monaural speech enhancement to combine the advantages of spectral mapping and spectral masking. The DBDIUNet uses a classical encoder-decoder architecture, including a shared encoder and two decoders. One decoder outputs the complex ideal ratio mask (cIRM), and the other outputs the enhanced complex spectrum. The two signals are coupled by coherent averaging to get the enhanced speech signal. A novel deep interaction structure is proposed for the interaction of information between the two decoders, which achieves a very significant performance improvement at the minimal cost of computational consumption and hyperparameters. Compared with the noisy speech on the Interspeech 2020 deep noise suppression challenge blind test set, DBDIUNet improves the WB-PESQ, NB-PESQ, STOI, SI-SDR indicators by 1.575, 0.955, 7.9%, 8.67 respectively. In the noisy-reverberant speech enhancement test, DBDIUNet improves the WB-PESQ, STOI, SI-SDR, DNSMOS, and SRMR by 0.98, 10.24%, 5.43, 1.51, 3.43, respectively, which exceeds the state-of-the-art model.
更多
查看译文
关键词
Noisy-reverberant speech enhancement,Dual branch UNet,Deep interactive temporal-frequency attention,Frequency domain
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要