谷歌浏览器插件
订阅小程序
在清言上使用

A Comparison of Chinese Word Segmentation on News and Microblog Corpora with a Lexicon Based Method.

CIPS-SIGHAN(2012)

引用 23|浏览10
暂无评分
摘要
Microblog is a new and important social media nowadays. Can traditional methods deal well with Chinese microblog word segmentation? We adopt the forward maximum matching (FMM) method and design rules to recognize words with non-Chinese characters. We focus on comparing results between news text and microblog. The lexicon based method allows us to investigate well new words emerging in microblog by comparing with lexicon words. Experimental results show that the performance on microblog outperforms that on news text under the same setup, which may be a signal that microblog word segmentation is not as hard as expected.
更多
查看译文
关键词
Page Segmentation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要