New Entropy-Based Vocabulary Optimization Approach For Chinese Language Modeling
PROCEEDINGS OF THE 2007 IEEE INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (NLP-KE'07)(2007)
摘要
This paper proposed a new entropy-based vocabulary optimization approach for Chinese language modeling. This approach alms to directly optimize the language model by extendm*g the vocabulary, that is, to minimize the character perplexity of the lanRuaRe model. A new criterion for new words selection was developed based on the character perplexity metric. A fast computing method and a simple divideand-conquer method were proposed to deal - I iments,v th very large corpus. Experi showed about 3% character perplexitv reduction and 3% character error rate reduction in a speech recognition task. Comparison experiments were also conducted to compare with other approaches.
更多查看译文
关键词
language model,natural language processing,divide and conquer,speech recognition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要