Consonant discrimination in elicited and spontaneous speech: a case for signal-adaptive front ends in ASR

INTERSPEECH(2000)

引用 29|浏览24
暂无评分
摘要
The constant frame length in typical ASR front ends is too long to capture transient phenomena in speech, such as stop bursts. How- ever, current HMM systems have consistently outperformed sys- tems based solely on non-uniform units. This work investigates an approach to "add back" such transient information to a speech recognizer, without losing the robustness of the standard a coustic models. We demonstrate a set of phonetically-motivated acoustic features that discriminate a preliminary test set of highly ambigu- ous voiceless stops in CV contexts. The features are automatically computed from data that had been hand-marked for consonant burst location and voicing onset (extension to automatic marking is also proposed). Two corpora are processed using a parallel set of fea- tures: conversational speech over the telephone (Switchboard), and a corpus of carefully elicited speech. The latter provides a n upper bound on discrimination, and allows for comparison of feature us- age across speaking style. We explore data-driven approaches to obtaining variable-length time-localized features compatible with an HMM statistical framework. We also suggest techniques for ex- tension to automatic annotation of burst location, for computation of features at such points, and for augmentation of an HMM system with the added information.
更多
查看译文
关键词
upper bound,front end
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要