SpeakerGAN: Speaker identification with conditional generative adversarial network

Neurocomputing(2020)

引用 19|浏览35
暂无评分
摘要
Current methods based on the traditional i-vectors and deep neural network (DNN) have shown effectiveness on the speaker identification task, especially with the corpus of large scale. However, when the size of the training dataset is small, the overfitting problem may happen and lead to performance degradation. Besides, the robust identification still remains a challenging problem even under the less strict requirements. This paper proposes a novel approach, SpeakerGAN, for speaker identification with the conditional generative adversarial network (CGAN). It allows the adversarial networks for distinguishing real/fake samples and predicting class labels simultaneously. We configure the generator and the discriminator in SpeakerGAN with the gated convolutional neural network (CNN) and the modified residual network (ResNet) to obtain generated samples of high diversity as well as increase the network capacity. The multiple loss functions are combined and optimized to encourage the correct mapping and accelerate the convergence. Experimental results show that SpeakerGAN reduces the classification error rate by 87% and 16% compared with the traditional i-vector system and the state-of-the-art DNN based method. Under the scenario of limited training data, SpeakerGAN obtains significant improvement over the baselines. In the case of taking 1.6 s of each speaker for testing, SpeakerGAN achieves the identification accuracy of 98.20%, which suggests the promise for short-utterance speaker identification.
更多
查看译文
关键词
Speaker identification,Generative adversarial networks,Deep neural networks
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要