AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
The maximum-likelihood estimate of a compact unlexicalized probabilistic context-free grammars can parse on par with early lexicalized parsers

Accurate Unlexicalized Parsing

ACL, pp.423-430, (2003)

被引用3713|浏览326
EI
下载 PDF 全文
引用
微博一下

摘要

We demonstrate that an unlexicalized PCFG can parse much more accurately than previously shown, by making use of simple, linguistically motivated state splits, which break down false independence assumptions latent in a vanilla treebank grammar. Indeed, its performance of 86.36% (LP/LR F PCFG models, and surprisingly close to the current ...更多

代码

数据

0
简介
  • Several results have brought into question how large a role lexicalization plays in such parsers. Johnson (1998) showed that the performance of an unlexicalized PCFG over the Penn treebank could be improved enormously by annotating each node by its parent category.
  • To the extent that no such strong baseline has been provided, the community has tended to greatly overestimate the beneficial effect of lexicalization in probabilistic parsing, rather than looking critically at where lexicalized probabilities are both needed to make the right decision and available in the training data
  • This result affirms the value of linguistic analysis for feature discovery.
  • The authors see this investigation as only one part of the foundation for state-of-the-art parsing which employs both lexical and structural conditioning
重点内容
  • Several results have brought into question how large a role lexicalization plays in such parsers. Johnson (1998) showed that the performance of an unlexicalized probabilistic context-free grammars (PCFGs) over the Penn treebank could be improved enormously by annotating each node by its parent category
  • We show that the parsing performance that can be achieved by an unlexicalized PCFG is far higher than has previously been demonstrated, and is, much higher than community wisdom has thought possible
  • Linguistically motivated annotations which do much to close the gap between a vanilla PCFG and state-of-the-art lexicalized models
  • We construct an unlexicalized PCFG which outperforms the lexicalized PCFGs of Magerman (1995) and Collins (1996) (though not more recent models, such as Charniak (1997) or Collins (1999)). One benefit of this result is a much-strengthened lower bound on the capacity of an unlexicalized PCFG
  • We have shown that, surprisingly, the maximum-likelihood estimate of a compact unlexicalized PCFG can parse on par with early lexicalized parsers
方法
  • To facilitate comparison with previous work, the authors trained the models on sections 2–21 of the WSJ section of the Penn treebank.
  • The authors used the first 20 files (393 sentences) of section 22 as a development set.
  • All of section 23 was used as a test set for the final model.
  • The authors used a simple array-based Java implementation of a generalized CKY parser, which, for the final best model, was able to exhaustively parse all sentences in section 23 in 1GB of memory, taking approximately 3 sec for average length sentences.6
  • Given a set of transformed trees, the authors viewed the local trees as grammar rewrite rules in the standard way, and used maximum-likelihood estimates for rule probabilities.5 To parse the grammar, the authors used a simple array-based Java implementation of a generalized CKY parser, which, for the final best model, was able to exhaustively parse all sentences in section 23 in 1GB of memory, taking approximately 3 sec for average length sentences.6
结果
  • The authors took the final model and used it to parse section 23 of the treebank. Figure 8 shows the results.
  • The test set F1 is 86.32% for ≤ 40 words, already higher than early lexicalized models, though lower than the state-of-the-art parsers
结论
  • The advantages of unlexicalized grammars are clear enough – easy to estimate, easy to parse with, and time- and space-efficient.
  • The authors have shown that, surprisingly, the maximum-likelihood estimate of a compact unlexicalized PCFG can parse on par with early lexicalized parsers.
  • The authors have shown ways to improve parsing, some easier than lexicalization, and others of which are orthogonal to it, and could presumably be used to benefit lexicalized parsers as well
总结
  • Introduction:

    Several results have brought into question how large a role lexicalization plays in such parsers. Johnson (1998) showed that the performance of an unlexicalized PCFG over the Penn treebank could be improved enormously by annotating each node by its parent category.
  • To the extent that no such strong baseline has been provided, the community has tended to greatly overestimate the beneficial effect of lexicalization in probabilistic parsing, rather than looking critically at where lexicalized probabilities are both needed to make the right decision and available in the training data
  • This result affirms the value of linguistic analysis for feature discovery.
  • The authors see this investigation as only one part of the foundation for state-of-the-art parsing which employs both lexical and structural conditioning
  • Methods:

    To facilitate comparison with previous work, the authors trained the models on sections 2–21 of the WSJ section of the Penn treebank.
  • The authors used the first 20 files (393 sentences) of section 22 as a development set.
  • All of section 23 was used as a test set for the final model.
  • The authors used a simple array-based Java implementation of a generalized CKY parser, which, for the final best model, was able to exhaustively parse all sentences in section 23 in 1GB of memory, taking approximately 3 sec for average length sentences.6
  • Given a set of transformed trees, the authors viewed the local trees as grammar rewrite rules in the standard way, and used maximum-likelihood estimates for rule probabilities.5 To parse the grammar, the authors used a simple array-based Java implementation of a generalized CKY parser, which, for the final best model, was able to exhaustively parse all sentences in section 23 in 1GB of memory, taking approximately 3 sec for average length sentences.6
  • Results:

    The authors took the final model and used it to parse section 23 of the treebank. Figure 8 shows the results.
  • The test set F1 is 86.32% for ≤ 40 words, already higher than early lexicalized models, though lower than the state-of-the-art parsers
  • Conclusion:

    The advantages of unlexicalized grammars are clear enough – easy to estimate, easy to parse with, and time- and space-efficient.
  • The authors have shown that, surprisingly, the maximum-likelihood estimate of a compact unlexicalized PCFG can parse on par with early lexicalized parsers.
  • The authors have shown ways to improve parsing, some easier than lexicalization, and others of which are orthogonal to it, and could presumably be used to benefit lexicalized parsers as well
基金
  • This paper is based on work supported in part by the National Science Foundation under Grant No IIS0085896, and in part by an IBM Faculty Partnership Award to the second author
引用论文
  • James K. Baker. 1979. Trainable grammars for speech recognition. In D. H. Klatt and J. J. Wolf, editors, Speech Communication Papers for the 97th Meeting of the Acoustical Society of America, pages 547–550.
    Google ScholarLocate open access versionFindings
  • Taylor L. Booth and Richard A. Thomson. 1973. Applying probability measures to abstract languages. IEEE Transactions on Computers, C-22:442–450.
    Google ScholarLocate open access versionFindings
  • Sharon A. Caraballo and Eugene Charniak. 1998. New figures of merit for best-first probabilistic chart parsing. Computational Linguistics, 24:275–298.
    Google ScholarLocate open access versionFindings
  • Eugene Charniak, Sharon Goldwater, and Mark Johnson. 1998. Edge-based best-first chart parsing. In Proceedings of the Sixth Workshop on Very Large Corpora, pages 127–133.
    Google ScholarLocate open access versionFindings
  • Eugene Charniak. 1996. Tree-bank grammars. In Proc. of the 13th National Conference on Artificial Intelligence, pp. 1031–1036.
    Google ScholarLocate open access versionFindings
  • Eugene Charniak. 1997. Statistical parsing with a context-free grammar and word statistics. In Proceedings of the 14th National Conference on Artificial Intelligence, pp. 598–603.
    Google ScholarLocate open access versionFindings
  • Eugene Charniak. 2000. A maximum-entropy-inspired parser. In NAACL 1, pages 132–139.
    Google ScholarLocate open access versionFindings
  • Eugene Charniak. 2001. Immediate-head parsing for language models. In ACL 39.
    Google ScholarFindings
  • Noam Chomsky. 1965. Aspects of the Theory of Syntax. MIT Press, Cambridge, MA.
    Google ScholarFindings
  • Michael John Collins. 1996. A new statistical parser based on bigram lexical dependencies. In ACL 34, pages 184–191.
    Google ScholarLocate open access versionFindings
  • M. Collins. 1999. Head-Driven Statistical Models for Natural Language Parsing. Ph.D. thesis, Univ. of Pennsylvania.
    Google ScholarFindings
  • Jason Eisner and Giorgio Satta. 1999. Efficient parsing for bilexical context-free grammars and head-automaton grammars. In ACL 37, pages 457–464.
    Google ScholarLocate open access versionFindings
  • Marilyn Ford, Joan Bresnan, and Ronald M. Kaplan. 1982. A competence-based theory of syntactic closure. In Joan Bresnan, editor, The Mental Representation of Grammatical Relations, pages 727–796. MIT Press, Cambridge, MA.
    Google ScholarLocate open access versionFindings
  • Daniel Gildea. 2001. Corpus variation and parser performance. In 2001 Conference on Empirical Methods in Natural Language Processing (EMNLP).
    Google ScholarLocate open access versionFindings
  • Donald Hindle and Mats Rooth. 1993. Structural ambiguity and lexical relations. Computational Linguistics, 19(1):103–120.
    Google ScholarLocate open access versionFindings
  • Mark Johnson. 1998. PCFG models of linguistic tree representations. Computational Linguistics, 24:613–632.
    Google ScholarLocate open access versionFindings
  • Dan Klein and Christopher D. Manning. 2001. Parsing with treebank grammars: Empirical bounds, theoretical models, and the structure of the Penn treebank. In ACL 39/EACL 10.
    Google ScholarLocate open access versionFindings
  • David M. Magerman. 1995. Statistical decision-tree models for parsing. In ACL 33, pages 276–283.
    Google ScholarLocate open access versionFindings
  • Andrew Radford. 1988. Transformational Grammar. Cambridge University Press, Cambridge.
    Google ScholarFindings
  • Dana Ron, Yoram Singer, and Naftali Tishby. 1994. The power of amnesia. Advances in Neural Information Processing Systems, volume 6, pages 176–183. Morgan Kaufmann.
    Google ScholarLocate open access versionFindings
您的评分 :
0

 

最佳论文
2003年, 荣获ACL的最佳论文奖
标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科