A machine learning parser using an unlexicalized distituent model

Samuel W. K. Chan,Lawrence Y. L. Cheung,Mickey W. C. Chong

COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING（2010）

引用 2|浏览0

暂无评分

摘要

Despite the popularity of lexicalized parsing models, practical concerns such as data sparseness and applicability to domains of different vocabularies make unlexicalized models that do not refer to word tokens themselves deserve more attention. A classifier-based parser using an unlexicalized parsing model has been developed. Most importantly, to enhance the accuracy of these tasks, we investigated the notion of distituency (the possibility that two parts of speech cannot remain in the same constituent or phrase) and incorporated it as attributes using various statistic measures. A machine learning method integrates linguistic attributes and information-theoretic attributes in two tasks, namely sentence chunking and phrase recognition. The parser was applied to parsing English and Chinese sentences in the Penn Treebank and the Tsinghua Chinese Treebank. It achieved a parsing performance of F-Score 80.3% in English and 82.4% in Chinese.

查看译文

关键词

unlexicalized model,phrase recognition,classifier-based parser,parsing performance,penn treebank,unlexicalized parsing model,tsinghua chinese treebank,chinese sentence,parsing english,lexicalized parsing model,unlexicalized distituent model,machine learning,part of speech,parsing

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要