New Entropy-Based Vocabulary Optimization Approach For Chinese Language Modeling

PROCEEDINGS OF THE 2007 IEEE INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (NLP-KE'07)(2007)

引用 0|浏览2
暂无评分
摘要
This paper proposed a new entropy-based vocabulary optimization approach for Chinese language modeling. This approach alms to directly optimize the language model by extendm*g the vocabulary, that is, to minimize the character perplexity of the lanRuaRe model. A new criterion for new words selection was developed based on the character perplexity metric. A fast computing method and a simple divideand-conquer method were proposed to deal - I iments,v th very large corpus. Experi showed about 3% character perplexitv reduction and 3% character error rate reduction in a speech recognition task. Comparison experiments were also conducted to compare with other approaches.
更多
查看译文
关键词
language model,natural language processing,divide and conquer,speech recognition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要