A Two-Pass Framework of Mispronunciation Detection and Diagnosis for Computer-Aided Pronunciation Training.

Xiaojun Qian,Helen M. Meng,Frank K. Soong

IEEE/ACM Trans. Audio, Speech & Language Processing（2016）

引用 38|浏览39

暂无评分

摘要

This paper presents a two-pass framework of mispronunciation detection and diagnosis (MDu0026D) — detection followed by diagnosis, without the need of explicit error pattern modeling, so that the main efforts can be devoted to improving acoustic modeling by discriminative training (or by applying alternative models like neural nets). The framework instantiates a set of anti-phones and a filler model in addition to the original phone model set, and crafts a general and compact phone error detection network. The detection network guarantees full coverage of all possible error patterns while maximally exploits the constraint offered by the text prompt. Specifically, it includes anti-phones to detect substitutions, filler model to detect insertions, and skips to detect deletions, so there is no prior assumptions on the possible form of error patterns. The subsequent diagnosis step expands the detected insertions and substitutions into phone networks, after which another recognition pass reveals the true identities of the detected errors. The crux of the trick is to bring down the modeling and recognition granularity down in the detection pass. Discriminative training (DT) of the detection and diagnosis models by minimizing the two expected full-sequence phone-level errors in the respective passes brings down the overall phone-level MDu0026D error by a relative of 40%. In particular, visualization of models in the framework shows that discriminative training effectively separates the canonical phones and their anti-phones.

查看译文

关键词

Hidden Markov models,Acoustics,Speech,Training,Speech processing,Feature extraction

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要