Chinese composite words tagging using selective back-off smoothing

semanticscholar(2015)

引用 0|浏览0
暂无评分
摘要
The aim of this research is to tag unknown Chines words with their part-of-speech (POS). Even narrow coverage of unknown words produces explosive ambiguity in natural language processing. At the same time, a completely unsupervised and refined POS tagging is impossible without any help from lexicographers. In this research, we propose to implement a means of un-locking POS tags based on two important features: word structure and word sequence in raw text. A similarity-based technique will be employed to classify an unknown word using its orthographic form and its contextual neighbors without becoming trapped in a subjective linguistic quagmire. The technique produces a good estimate of POS tags of Chinese compound words before they are fed into a tagger. A recursive inferential mechanism is also devised to alleviate the ripple effect from changes made at its neighbors during tagging. The approach is justified with a compound words database with more than 53,500 words. Experimental results with 500,000 words show the approach outperforms its counterparts. * Corresponding author: swkchan@cuhk.edu.hk Linguistic Corpus and Corpus Linguistics in the Chinese Context Journal of Chinese Linguistics Monograph Series 25 (2015): 139-159 ©2015 by Journal of Chinese Linguistics. All rights reserved. 2409-2878/2015/25-0007 Th e C hin ese U niv ers ity Pr ess C op yri gh ted M ate ria ls Linguistic Corpus and Corpus Linguistics in the Chinese Context 140 PART II ANNOTATION AND DATA EXTRACTION
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要