Symbolic Modeling of Prosody: From Linguistics to Statistics

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)（2015）

引用 11|浏览26

暂无评分

摘要

The assignment of prosodic events (accent and phrasing) from the text is crucial in text-to-speech synthesis systems. This paper addresses the combination of linguistic and metric constraints for the assignment of prosodic events in text-to-speech synthesis. First, a linguistic processing chain is used to provide a rich linguistic description of a text. Then, a novel statistical representation based on a hierarchical HMM (HHMM) is used to model the prosodic structure of a text: the root layer represents the text, each intermediate layer a sequence of intermediate phrases, the pre-terminal layer the sequence of accents, and the terminal layer the sequence of linguistic contexts. For each intermediate layer, a segmental HMM and information fusion are used to fuse the linguistic and metric constraints for the segmentation of a text into phrases. A set of experiments conducted on multi-speaker databases with various speaking styles reports that: the rich linguistic representation improves drastically the assignment of prosodic events, and the fusion of linguistic and metric constraints significantly improves over standard methods for the segmentation of a text into phrases. These constitute substantial advances that can be further used to model the speech prosody of a speaker, a speaking style, and emotions for text-to-speech synthesis.

查看译文

关键词

hidden markov models,linguistics,speech synthesis,accent event,hierarchical hmm,hierarchical hidden markov model,information fusion,linguistic constraint,linguistic description,linguistic processing chain,metric constraint,phrasing event,prosodic events,prosody symbolic modeling,segmental hmm,statistical representation,statistics,text prosodic structure,text segmentation,text-to-speech synthesis system,dempster-shafer fusion,hierarchical hmms,segmental hmms,speaking style,speech prosody,surface/deep syntactic parsing,text-to-speech synthesis,measurement,speech,pragmatics,speech processing

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要