AI helps you reading Science
AI Insight
AI extracts a summary of this paper
Weibo:
Accurate Unlexicalized Parsing
ACL, (2003): 423-430
EI
Keywords
Abstract
We demonstrate that an unlexicalized PCFG can parse much more accurately than previously shown, by making use of simple, linguistically motivated state splits, which break down false independence assumptions latent in a vanilla treebank grammar. Indeed, its performance of 86.36% (LP/LR F PCFG models, and surprisingly close to the current ...More
Code:
Data:
Introduction
- Several results have brought into question how large a role lexicalization plays in such parsers. Johnson (1998) showed that the performance of an unlexicalized PCFG over the Penn treebank could be improved enormously by annotating each node by its parent category.
- To the extent that no such strong baseline has been provided, the community has tended to greatly overestimate the beneficial effect of lexicalization in probabilistic parsing, rather than looking critically at where lexicalized probabilities are both needed to make the right decision and available in the training data
- This result affirms the value of linguistic analysis for feature discovery.
- The authors see this investigation as only one part of the foundation for state-of-the-art parsing which employs both lexical and structural conditioning
Highlights
- Several results have brought into question how large a role lexicalization plays in such parsers. Johnson (1998) showed that the performance of an unlexicalized probabilistic context-free grammars (PCFGs) over the Penn treebank could be improved enormously by annotating each node by its parent category
- We show that the parsing performance that can be achieved by an unlexicalized PCFG is far higher than has previously been demonstrated, and is, much higher than community wisdom has thought possible
- Linguistically motivated annotations which do much to close the gap between a vanilla PCFG and state-of-the-art lexicalized models
- We construct an unlexicalized PCFG which outperforms the lexicalized PCFGs of Magerman (1995) and Collins (1996) (though not more recent models, such as Charniak (1997) or Collins (1999)). One benefit of this result is a much-strengthened lower bound on the capacity of an unlexicalized PCFG
- We have shown that, surprisingly, the maximum-likelihood estimate of a compact unlexicalized PCFG can parse on par with early lexicalized parsers
Methods
- To facilitate comparison with previous work, the authors trained the models on sections 2–21 of the WSJ section of the Penn treebank.
- The authors used the first 20 files (393 sentences) of section 22 as a development set.
- All of section 23 was used as a test set for the final model.
- The authors used a simple array-based Java implementation of a generalized CKY parser, which, for the final best model, was able to exhaustively parse all sentences in section 23 in 1GB of memory, taking approximately 3 sec for average length sentences.6
- Given a set of transformed trees, the authors viewed the local trees as grammar rewrite rules in the standard way, and used maximum-likelihood estimates for rule probabilities.5 To parse the grammar, the authors used a simple array-based Java implementation of a generalized CKY parser, which, for the final best model, was able to exhaustively parse all sentences in section 23 in 1GB of memory, taking approximately 3 sec for average length sentences.6
Results
- The authors took the final model and used it to parse section 23 of the treebank. Figure 8 shows the results.
- The test set F1 is 86.32% for ≤ 40 words, already higher than early lexicalized models, though lower than the state-of-the-art parsers
Conclusion
- The advantages of unlexicalized grammars are clear enough – easy to estimate, easy to parse with, and time- and space-efficient.
- The authors have shown that, surprisingly, the maximum-likelihood estimate of a compact unlexicalized PCFG can parse on par with early lexicalized parsers.
- The authors have shown ways to improve parsing, some easier than lexicalization, and others of which are orthogonal to it, and could presumably be used to benefit lexicalized parsers as well
Funding
- This paper is based on work supported in part by the National Science Foundation under Grant No IIS0085896, and in part by an IBM Faculty Partnership Award to the second author
Reference
- James K. Baker. 1979. Trainable grammars for speech recognition. In D. H. Klatt and J. J. Wolf, editors, Speech Communication Papers for the 97th Meeting of the Acoustical Society of America, pages 547–550.
- Taylor L. Booth and Richard A. Thomson. 1973. Applying probability measures to abstract languages. IEEE Transactions on Computers, C-22:442–450.
- Sharon A. Caraballo and Eugene Charniak. 1998. New figures of merit for best-first probabilistic chart parsing. Computational Linguistics, 24:275–298.
- Eugene Charniak, Sharon Goldwater, and Mark Johnson. 1998. Edge-based best-first chart parsing. In Proceedings of the Sixth Workshop on Very Large Corpora, pages 127–133.
- Eugene Charniak. 1996. Tree-bank grammars. In Proc. of the 13th National Conference on Artificial Intelligence, pp. 1031–1036.
- Eugene Charniak. 1997. Statistical parsing with a context-free grammar and word statistics. In Proceedings of the 14th National Conference on Artificial Intelligence, pp. 598–603.
- Eugene Charniak. 2000. A maximum-entropy-inspired parser. In NAACL 1, pages 132–139.
- Eugene Charniak. 2001. Immediate-head parsing for language models. In ACL 39.
- Noam Chomsky. 1965. Aspects of the Theory of Syntax. MIT Press, Cambridge, MA.
- Michael John Collins. 1996. A new statistical parser based on bigram lexical dependencies. In ACL 34, pages 184–191.
- M. Collins. 1999. Head-Driven Statistical Models for Natural Language Parsing. Ph.D. thesis, Univ. of Pennsylvania.
- Jason Eisner and Giorgio Satta. 1999. Efficient parsing for bilexical context-free grammars and head-automaton grammars. In ACL 37, pages 457–464.
- Marilyn Ford, Joan Bresnan, and Ronald M. Kaplan. 1982. A competence-based theory of syntactic closure. In Joan Bresnan, editor, The Mental Representation of Grammatical Relations, pages 727–796. MIT Press, Cambridge, MA.
- Daniel Gildea. 2001. Corpus variation and parser performance. In 2001 Conference on Empirical Methods in Natural Language Processing (EMNLP).
- Donald Hindle and Mats Rooth. 1993. Structural ambiguity and lexical relations. Computational Linguistics, 19(1):103–120.
- Mark Johnson. 1998. PCFG models of linguistic tree representations. Computational Linguistics, 24:613–632.
- Dan Klein and Christopher D. Manning. 2001. Parsing with treebank grammars: Empirical bounds, theoretical models, and the structure of the Penn treebank. In ACL 39/EACL 10.
- David M. Magerman. 1995. Statistical decision-tree models for parsing. In ACL 33, pages 276–283.
- Andrew Radford. 1988. Transformational Grammar. Cambridge University Press, Cambridge.
- Dana Ron, Yoram Singer, and Naftali Tishby. 1994. The power of amnesia. Advances in Neural Information Processing Systems, volume 6, pages 176–183. Morgan Kaufmann.
Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn