Nonparallel Voice Conversion With Augmented Classifier Star Generative Adversarial Networks
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING(2020)
摘要
We previously proposed a method that allows for nonparallel voice conversion (VC) by using a variant of generative adversarial networks (GANs) called StarGAN. The main features of our method, called StarGAN-VC, are as follows: First, it requires no parallel utterances, transcriptions, or time alignment procedures for speech generator training. Second, it can simultaneously learn mappings across multiple domains using a single generator network and thus fully exploit available training data collected from multiple domains to capture latent features that are common to all the domains. Third, it can generate converted speech signals quickly enough to allow real-time implementations and requires only several minutes of training examples to generate reasonably realistic-sounding speech. In this article, we describe three formulations of StarGAN, including a newly introduced novel StarGAN variant called "Augmented classifier StarGAN (A-StarGAN)", and compare them in a nonparallel VC task. We also compare them with several baseline methods.
更多查看译文
关键词
Voice conversion (VC), nonparallel VC, multi-domain VC, generative adversarial networks (GANs), CycleGAN, StarGAN, A-StarGAN
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络