WA-Transformer: Window Attention-based Transformer with Two-stage Strategy for Multi-task Audio Source Separation
Conference of the International Speech Communication Association (INTERSPEECH)(2022)
Abstract
The standard Conformer adopts convolution layers to exploit local features. However, the one-dimensional convolution ignores the correlation of adjacent time-frequency features. In this paper, we design a two-dimensional window attention block with dilation, and then we propose a window attention-based Transformer network (named WA-Transformer) for multi-task audio source separation. The proposed WA-Transformer adopts self-attention and window attention blocks to model global dependencies and local correlation in a parameter-efficient way. Besides, it follows a two-stage pipeline, in which the first stage separates the mixture and outputs the three types of audio signals, and the second stage performs signal compensation. Experiments demonstrate the effectiveness of WA-Transformer. WA-Transformer achieves 13.86 dB, 12.22 dB, 11.21 dB signal-to-distortion ratio improvement on speech, music, noise track, respectively, and advantages over several well-known models.
MoreTranslated text
Key words
multi-task audio source separation, WA-Transformer, two-dimensional window attention
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined