Word Emphasis Prediction for Expressive Text to Speech

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES(2018)

引用 20|浏览1
暂无评分
摘要
Word emphasis prediction is an important part of expressive prosody generation in modem Text-To-Speech (ITS) systems. We present a method for predicting emphasized words for expressive TB, based on a Deep Neural Network (DNN). We show that the presented method outperforms machine learning methods based on hand-crafted features in terms of objective metrics such as precision and recall. Using a listening test, we further demonstrate that the contribution of the predicted emphasized words to the expressiveness of the synthesized speech is subjectively perceivable.
更多
查看译文
关键词
word emphasis,speech synthesis,expressive text to speech,prosody,deep learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要