Creating Personalized Synthetic Voices from Articulation Impaired Speech Using Augmented Reconstruction Loss
CoRR(2024)
摘要
This research is about the creation of personalized synthetic voices for head
and neck cancer survivors. It is focused particularly on tongue cancer patients
whose speech might exhibit severe articulation impairment. Our goal is to
restore normal articulation in the synthesized speech, while maximally
preserving the target speaker's individuality in terms of both the voice timbre
and speaking style. This is formulated as a task of learning from noisy labels.
We propose to augment the commonly used speech reconstruction loss with two
additional terms. The first term constitutes a regularization loss that
mitigates the impact of distorted articulation in the training speech. The
second term is a consistency loss that encourages correct articulation in the
generated speech. These additional loss terms are obtained from frame-level
articulation scores of original and generated speech, which are derived using a
separately trained phone classifier. Experimental results on a real case of
tongue cancer patient confirm that the synthetic voice achieves comparable
articulation quality to unimpaired natural speech, while effectively
maintaining the target speaker's individuality. Audio samples are available at
https://myspeechproject.github.io/ArticulationRepair/.
更多查看译文
关键词
Personalized speech synthesis,articulation disorder,learning from noisy labels
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要