Error-Driven Pruning of Treebank Grammars for Base Noun Phrase Identification

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1(1998)

引用 55|浏览0
暂无评分
摘要
Finding simple, non-recursive, base noun phrases is an important subtask for many natural language processing applications. While previous empirical methods for base NP identification have been rather complex, this paper instead proposes a very simple algorithm that is tailored to the relative simplicity of the task. In particular, we present a corpus-based approach for finding base NPs by matching part-of-speech tag sequences. The training phase of the algorithm is based on two successful techniques: first the base NP grammar is read from a ``treebank'' corpus; then the grammar is improved by selecting rules with high ``benefit'' scores. Using this simple algorithm with a naive heuristic for matching rules, we achieve surprising accuracy in an evaluation on the Penn Treebank Wall Street Journal.
更多
查看译文
关键词
simple algorithm,base NP grammar,base NP identification,base NPs,base noun phrase,Penn Treebank Wall Street,corpus-based approach,important subtask,naive heuristic,natural language processing application,Error-driven pruning,Treebank grammar,base noun phrase identification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要