Syllable-Level Prominence Detection With Acoustic Evidence

11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4(2010)

引用 31|浏览8
暂无评分
摘要
Accurate prominence annotation benefits many spoken language understanding tasks as well as speech synthesis. In this work, we conduct a thorough study using acoustic prosodic cues for prominence detection in speech. This study is different from previous work in several aspects. In addition to the widely used prosodic features, such as pitch, energy, and duration, we introduce the use of cepstral features. Furthermore, we evaluate the effect of different features, speaker dependency and variation, different classifiers, and contextual information. Our experiments on the Boston University Radio News Corpus show that although the cepstral features alone do not perform well, when combined with prosodic features they yield some performance gain and, more importantly, can reduce much of the speaker variation in this task. We find that the previous context is more informative than the following context, and their combination achieves the best performance. The final result using selected features with context information is significantly better than that in previous work.
更多
查看译文
关键词
prosody,prominence,pitch accent,speaker variation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要