A Study Of Unsupervised Clustering Techniques For Language Modeling

INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5(2008)

引用 23|浏览28
暂无评分
摘要
There has been recent interest in clustering text data to build topic-specific language models for large vocabulary speech recognition. In this paper, we studied various unsupervised clustering algorithms on several corpora. First we compared the clustering methods with quality metrics such as entropy and purity. Of the techniques studied, two-phase bisecting K-means achieved good performance with relatively fast speed. Then we performed speech recognition experiments on English and Arabic systems using the automatically derived topic-based language models. We obtained modest word error rate improvements, comparable to previously published studies. A careful analysis of the correlation between word error rate and the distribution of misrecognized words, including an information-gain metric, is presented.
更多
查看译文
关键词
Clustering, Language Model Adaptation, Entropy
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要