Detection Of Word Fragments In Mandarin Telephone Conversation

Cheng-Tao Chu,Yun-Hsuan Sung,Yuan Zhao,Dan Jurafsky

INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5（2006）

引用 30|浏览18

暂无评分

摘要

We describe preliminary work on the detection of word fragments in Mandarin conversational telephone speech. We extracted prosodic, voice quality, and lexical features, and trained Decision Tree and SVM classifiers. Previous research shows that glottalization features are instrumental in English fragment detection. However, we show that Mandarin fragments are quite different than English; 90% of Mandarin fragments are followed immediately by a repetition of the fragmentary word. These repetition fragments are not glottalized, and they have a very specific distribution; the 12 most frequent words ("you", "I", "that", "have", "then", etc.) cover 50% of the tokens of these fragments. Thus rather than glottalization, we found the most useful feature for Mandarin fragment detection was the identity of the neighboring character (word or morpheme). In an oracle experiment using the true (reference) neighboring words as well as prosodic and voice quality features, we achieved 80% accuracy in Mandarin fragment detection.

查看译文

关键词

Disfluencies,Word Fragments,Mandarin,Prosody,Voice Quality

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要