Class-Based language models for chinese-english parallel corpus

CICLing (2)(2013)

引用 2|浏览0
暂无评分
摘要
This paper addresses using novel class-based language models on parallel corpora, focusing specifically on English and Chinese languages. We find that the perplexity of Chinese is generally much higher than English and discuss the possible reasons. We demonstrate the relative effectiveness of using class-based models over the modified Kneser-Ney trigram model for our task. We also introduce a rare events clustering and a polynomial discounting mechanism, which is shown to improve results. Our experimental results on parallel corpora indicate that the improvement due to classes are similar for English and Chinese. This suggests that class-based language models should be used for both languages.
更多
查看译文
关键词
rare event,modified kneser-ney trigram model,novel class-based language model,chinese language,possible reason,parallel corpus,class-based language model,polynomial discounting mechanism,class-based model,chinese-english parallel corpus
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要