Adaptive Neural Speech Enhancement with a Denoising Variational Autoencoder.
INTERSPEECH(2020)
摘要
This paper presents a neural speech enhancement method that has a statistical feedback mechanism based on a denoising variational autoencoder (VAE). Deep generative models of speech signals have been combined with unsupervised noise models for enhancing speech robustly regardless of the condition mismatch from the training data. This approach, however, often yields unnatural speech-like noise due to the unsuitable prior distribution on the latent speech representations. To mitigate this problem, we use a denoising VAE whose encoder estimates the latent vectors of clean speech from an input mixture signal. This encoder network is utilized as a prior distribution of the probabilistic generative model of the input mixture, and its condition mismatch is handled in a Bayesian manner. The speech signal is estimated by updating the latent vectors to fit the input mixture while noise is estimated by a nonnegative matrix factorization model. To efficiently train the encoder network, we also propose a multi-task learning of the denoising VAE with the standard mask-based enhancement. The experimental results show that our method outperforms the existing mask-based and generative enhancement methods in unknown conditions.
更多查看译文
关键词
speech enhancement, deep speech prior, denoising variational autoencoder, nonnegative matrix factorization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络