FRE-GAN 2: Fast and Efficient Frequency-Consistent Audio Synthesis

Sang-Hoon Lee,Ji-Hoon Kim,Kangeun Lee,Seong-Whan Lee

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)（2022）

引用 6|浏览13

暂无评分

摘要

Although recent advances in neural vocoder have shown significant improvement, most of these models have a trade-off between audio quality and computational complexity. Since the large model has a limitation on the low-resource devices, a more efficient neural vocoder should synthesize high-quality audio for practical applicability. In this paper, we present Fre-GAN 2, a fast and efficient high-quality audio synthesis model. For fast synthesis, Fre-GAN 2 only synthesizes low and high-frequency parts of the audio, and we leverage the inverse discrete wavelet transform to reproduce the target-resolution audio in the generator. Additionally, we also introduce adversarial periodic feature distillation, which makes the model synthesize high-quality audio with only a small parameter. The experimental results show the superiority of Fre-GAN 2 in audio quality. Furthermore, FreGAN 2 has a 10.91x generation acceleration, and the parameters are compressed by 21.23 x than Fre-GAN.

查看译文

关键词

audio synthesis,neural vocoder,generative adversarial networks,speech synthesis,test-to-speech

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要