Data Selection for Improving Naturalness of TTS Voices Trained on Small Found Corpuses.

SLT(2018)

引用 8|浏览21
暂无评分
摘要
This work investigates techniques that select training data from small, found corpuses in order to improve the naturalness of synthesized text-to-speech voices. The approach outlined in this paper examines different metrics to detect and reject segments of training data that can degrade the performance of the system. We conducted experiments on two small datasets extracted from Mandarin Chinese audiobooks that have different characteristics in terms of recording conditions, narrator, and transcriptions. We show that using a even smaller, yet carefully selected, set of data can lead to a text-to-speech system able to generate more natural speech than a system trained on the complete dataset. Three metrics related to the narrator’s articulation proposed in the paper give significant improvements in naturalness.
更多
查看译文
关键词
Measurement,Speech,Standards,Training,Hidden Markov models,Buildings,Acoustics
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要