谷歌浏览器插件
订阅小程序
在清言上使用

Monaural speech enhancement using U-net fused with multi-head self-attention

Chinese Journal of Acoustics(2023)

引用 0|浏览6
暂无评分
摘要
Under low signal-to-noise ratio(SNR)and burst noise conditions,the speech en-hancement effect of existing deep learning network models is not satisfactory.In contrast,humans can exploit the long-term correlation of speech to form an integrated perception of dif-ferent speech signals.Thus,describing the long-term dependencies of speech can help improve the enhancement performance under low SNR and burst noise conditions.Inspired by this feature,a time domain end-to-end monaural speech enhancement model TU-net that fuses the multi-head self-attention mechanism and U-net deep network is proposed.The TU-net model adopts the codec layer structure of U-net to achieve multi-scale feature fusion.It introduces the dual-path Transformer module using the multi-head self-attention mechanism to calculate the speech mask and better model long-term correlation.The TU-net model is trained with a weighted sum loss function in the time,time-frequency,and perceptual domains.Simulation experiments are carried out and the results show that with maintaining relatively fewer network model parameters,TU-net outperforms other similar monaural enhancement network models in several evaluation metrics such as perceptual evaluation of speech quality(PESQ),short-time objective intelligibility(STOI)and SNR gain under low SNR and burst noise conditions.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要