End-To-End Code-Switching Tts With Cross-Lingual Language Model

2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING(2020)

引用 30|浏览82
暂无评分
摘要
Code-switching text-to-speech (TTS) aims to enable a system to speak two languages with a single voice and in the same utterance. In this paper, we propose to incorporate cross-lingual word embedding into an end-to-end TTS system, to improve the voice rendering. The cross-lingual word embedding, generated from a pre-trained cross-lingual language model, is able to encode words of two languages in the same embedding space, therefore, allows words across languages to share each other's contextual information, which is useful for the voice rendering of code-switching content. To investigate the effectiveness of this idea, we conduct studies on two multi-speaker monolingual corpora, namely, THCHS30 Mandarin and LibriTTS English database. The evaluation results show that our proposed framework outperforms the baseline systems when presented with code-switching text input, and achieves state-of-the-art performance.
更多
查看译文
关键词
text-to-speech, code-switching, cross-lingual word embedding, end-to-end
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要