Selection and Training Schemes for Improving TTS Voice Built on Found Data

F.-Y. Kuo,I.C. Ouyang,S. Aryal,Pierre Lanchantin

INTERSPEECH（2019）

引用 5|浏览16

暂无评分

摘要

This work investigates different selection and training schemes to improve the naturalness of synthesized text-to-speech voices built on found data. The approach outlined in this paper examines the combinations of different metrics to detect and reject segments of training data that can degrade the performance of the system. We conducted a series of objective and subjective experiments on two 24-hour single-speaker corpuses of found data collected from diverse sources. We show that using an even smaller, yet carefully selected, set of data can lead to a text-to-speech system able to generate more natural speech than a system trained on the complete dataset. Moreover, we show that training the system by fine-tuning from the system trained on the whole dataset leads to additional improvement in naturalness by allowing a more aggressive selection of training data.

查看译文

关键词

TTS, data selection, found data

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要