FRE-GAN 2: Fast and Efficient Frequency-Consistent Audio Synthesis

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)(2022)

引用 6|浏览13
暂无评分
摘要
Although recent advances in neural vocoder have shown significant improvement, most of these models have a trade-off between audio quality and computational complexity. Since the large model has a limitation on the low-resource devices, a more efficient neural vocoder should synthesize high-quality audio for practical applicability. In this paper, we present Fre-GAN 2, a fast and efficient high-quality audio synthesis model. For fast synthesis, Fre-GAN 2 only synthesizes low and high-frequency parts of the audio, and we leverage the inverse discrete wavelet transform to reproduce the target-resolution audio in the generator. Additionally, we also introduce adversarial periodic feature distillation, which makes the model synthesize high-quality audio with only a small parameter. The experimental results show the superiority of Fre-GAN 2 in audio quality. Furthermore, FreGAN 2 has a 10.91x generation acceleration, and the parameters are compressed by 21.23 x than Fre-GAN.
更多
查看译文
关键词
audio synthesis,neural vocoder,generative adversarial networks,speech synthesis,test-to-speech
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要