A New Spoken Language Teaching Tech: Combining Multi-attention and AdaIN for One-shot Cross Language Voice Conversion

2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP)(2022)

引用 0|浏览3
暂无评分
摘要
Computer aided pronunciation training(CAPT) plays an important role in oral language teaching. The main methods of traditional computer-assisted oral teaching include mispronunciation detection and pronunciation scoring and assessment.However, these two techniques only give negative feedback information such as scores or error categories. In this case,it is difficult for learners to refine their pronunciation through these two indicators without the guidance of correct speech.To tackle this problem, we proposed a cross language voice conversion(VC) framework that can generate speech with template speech content and learners’ own timbre,which can guide the learner’s pronunciation.To improve VC effect,we apply AdaIN in the fore-end and after the Value matrix in multi-head attention once respectively,called attention-AdaIN,which can improve the style transfer and sequence generation ability.We used attention-AdaIN to construct VC framework based on VAE.Experiments conducted on the AISHELL-3 and VCTK corpus showed that this new aprroach improved the baseline VAE-VC.
更多
查看译文
关键词
Voice conversion,AdaIN,multi-head attention
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要