Multi-layer encoder-decoder time-domain single channel speech separation

Debang Liu,Tianqi Zhang,Mads Græsbøll Christensen,Chen Yi, Ying Wei

Pattern Recognition Letters（2024）

引用 0|浏览0

暂无评分

摘要

With the emergence of more advanced separation networks, significant progress has been made in time-domain speech separation methods. These methods typically use a temporal encoder–decoder structure to encode speech feature sequences, thereby accomplishing the separation task. However, due to the limitation of traditional encoder–decoder structure, the separation performance decreases sharply when the encoded sequence is short, and when encoded sequence is sufficiently long, the separation performance improves, but which leads to an increase in computational complexity and training cost. Therefore, this paper compresses and reconstructs the speech feature sequence through a multi-layer convolution structure, and proposes a multi-layer encoder–decoder time-domain speech separation model (MLED). In this model, our encoder–decoder structure can compress speech sequence to a short length while ensuring the separation performance does not decrease. And combined with our multi-scale temporal attention (MSTA) separation network, MLED achieves efficient and precise separation of short encoded sequences. Therefore, compared to previous advanced time-domain separation methods, our experiments show that MLED achieves competitive separation performance with smaller model size, lower computational complexity, and training cost.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要