Measuring The Effect Of Linguistic Resources On Prosody Modeling For Speech Synthesis
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)(2018)
摘要
The generation of natural and expressive prosodic contours is an important component of a text-to-speech (TTS) system which, in most classical architectures, relies on the existence of a text-analysis processor that can extract prosody-predictive features and pass them to a statistical learning model. These features can range from basic properties of the input string to rich high-level features which may not be always available when developing a TTS system in a new language with sparse computational resources. In this work we investigate how the prosody model of a speech-synthesis system performs as a function of different predictive feature sets that assume access to a certain amount of rich resources. We investigate, using objective metrics, the effect of relaxing the assumptions on input representations for prosody prediction for 5 languages, and evaluate the perceptual implications for US English.
更多查看译文
关键词
prosody prediction, speech synthesis, low resources
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络