GeoSegmenter: A statistically learned Chinese word segmenter for the geoscience domain

Computers & Geosciences(2015)

引用 25|浏览5
暂无评分
摘要
Unlike English, the Chinese language has no space between words. Segmenting texts into words, known as the Chinese word segmentation (CWS) problem, thus becomes a fundamental issue for processing Chinese documents and the first step in many text mining applications, including information retrieval, machine translation and knowledge acquisition. However, for the geoscience subject domain, the CWS problem remains unsolved. Although a generic segmenter can be applied to process geoscience documents, they lack the domain specific knowledge and consequently their segmentation accuracy drops dramatically.
更多
查看译文
关键词
Chinese word segmentation,Conditional random fields,Natural language processing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要