Adding Robustness to Language Models for Spontaneous Speech Recognition

msra(2013)

引用 24|浏览4
暂无评分
摘要
Compared to dictation systems, recognition systems for spontaneous speech still perform rather poorly. An im- portant weakness in these systems is the statistical lan- guage model, mainly due to the lack of large amounts of stylistically matching training data and to the occur- rence of disfluencies in the recognition input. In this pa- per we investigate a method for improving the robustness of a spontaneous language model by flexible manipula- tion of the prediction context when disfluencies occur. In the case of repetitions, we obtained significantly better recognition results on a benchmark Switchboard test set. sations (Switchboard). Disfluencies almost always oc- curred in the sentence's given information part. (6) ex- plore N-best list rescoring on the basis of chunking in- formation. The underlying motivation is that the cov- erage of the chunker bears information in order to dis- criminate between syntactically acceptable and syntacti- cally anomalous recognition hypotheses. The technique reduced the WER by 0.3% absolute on Switchboard. Fi- nally, (7) report on dealing with disfluencies in language modeling by editing the prediction context. More specif- ically, the prediction context for a newly hypothesized word is cleaned up by removing the disfluencies in it. The improvement in WER on Switchboard is, parallel to the other approaches, not really significant. The research described in this paper extends the latter work by imple- menting a more flexible manipulation of the prediction context: disfluencies are only removed from the context when they do not contain informational value. The paper is organized as follows. First, we discuss the investigated disfluencies and the proposed model to handle them. Next, the experimental set-up is described and results on the Switchboard task are given. Finally, we conclude and discuss future research on the topic.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要