谷歌浏览器插件
订阅小程序
在清言上使用

Speech Enhancement for Noise-Robust Speech Synthesis Using Wasserstein GAN

INTERSPEECH(2019)

引用 13|浏览10
暂无评分
摘要
The quality of speech synthesis systems can be significantly deteriorated by the presence of background noise in the recordings. Despite the existence of speech enhancement techniques for effectively suppressing additive noise under low signal-to-noise (SNR) conditions, these techniques have been neither designed nor tested in speech synthesis tasks where background noise has relatively lower energy. In this paper, we propose a speech enhancement technique based on generative adversarial networks (GANs) which acts as a preprocessing step of speech synthesis. Motivated by the speech enhancement generative adversarial network (SEGAN) approach and recent advances in deep learning, we propose to use Wasserstein GAN (WGAN) with gradient penalty and gated activation functions to the autoencoder network of SEGAN. We studied the impact of the proposed method on a data set consisting of 28 speakers and different noise types with 3 different SNR level. The effectiveness of the proposed method in the context of speech synthesis is demonstrated through the training of WaveNet vocoder. We compare our method against SEGAN. Both subjective and objective metrics confirm that the proposed speech enhancement approach outperforms SEGAN in terms of speech synthesis quality.
更多
查看译文
关键词
Wasserstein GAN, Speech Enhancement, Gated activation, WaveNet Vocoder, Speech Synthesis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要