AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
There is a “mismatch” between the kind of lexical information that is captured by the Brown clusters and the kind of lexical information that is modeled in dependency parsing

Simple Semi-supervised Dependency Parsing

ACL, (2008): 595-603

引用475|浏览112
EI
下载 PDF 全文
引用
微博一下

摘要

We present a simple and effective semi- supervised method for training dependency parsers. We focus on the problem of lex- ical representation, introducing features that incorporate word clusters derived from a large unannotated corpus. We demonstrate the ef- fectiveness of the approach in a series of de- pendency parsing experiments on t...更多

代码

数据

简介
  • Lexical information is seen as crucial to resolving ambiguous relationships, yet lexicalized statistics are sparse and difficult to estimate directly.
  • Dependency parsing depends critically on predicting head-modifier relationships, which can be difficult due to the statistical sparsity of these word-to-word interactions.
  • The Carreras (2007) parser has parts for both sibling interactions and grandparent interactions, such as the trio “*”, “plays”, and “Haag” in Figure 1.
  • These kinds of higher-order factorizations allow dependency parsers to obtain a limited form of context-sensitivity
重点内容
  • In natural language parsing, lexical information is seen as crucial to resolving ambiguous relationships, yet lexicalized statistics are sparse and difficult to estimate directly
  • To demonstrate the effectiveness of our approach, we conduct experiments in dependency parsing, which has been the focus of much recent research—e.g., see work in the CoNLL shared tasks on dependency parsing (Buchholz and Marsi, 2006; Nivre et al, 2007)
  • We show that our semi-supervised approach yields improvements for fixed datasets by performing parsing experiments on the Penn Treebank (Marcus et al, 1993) and Prague Dependency Treebank (Hajic, 1998; Hajicet al., 2001)
  • We found that it was nontrivial to select the proper prefix lengths for the dependency parsing task; in particular, the prefix lengths used in the Miller et al (2004) work performed poorly in dependency parsing.2
  • We have presented a simple but effective semi-supervised learning approach and demonstrated that it achieves substantial improvement over a competitive baseline in two broad-coverage dependency parsing tasks
  • There is a “mismatch” between the kind of lexical information that is captured by the Brown clusters and the kind of lexical information that is modeled in dependency parsing
方法
  • In order to evaluate the effectiveness of the clusterbased feature sets, the authors conducted dependency parsing experiments in English and Czech.
  • The English experiments were performed on the Penn Treebank (Marcus et al, 1993), using a standard set of head-selection rules (Yamada and Matsumoto, 2003) to convert the phrase structure syntax of the Treebank to a dependency tree representation.6.
  • The part of speech tags for the development and test data were automatically assigned by MXPOST (Ratnaparkhi, 1996), where the tagger was trained on the entire training corpus; to generate part of speech tags for the training data, the authors used 10-way jackknifing.8 English word clusters were derived from the BLLIP corpus (Charniak et al, 2000), which contains roughly 43 million words of Wall Street Journal text.9
结果
  • The authors demonstrate that the method improves performance when small amounts of training data are available, and can roughly halve the amount of supervised data required to reach a desired level of performance.
结论
  • The authors have presented a simple but effective semi-supervised learning approach and demonstrated that it achieves substantial improvement over a competitive baseline in two broad-coverage dependency parsing tasks.
  • Despite this success, there are several ways in which the approach might be improved.
  • One could design clustering algorithms that cluster entire head-modifier arcs rather than individual words
表格
  • Table1: Examples of baseline and cluster-based feature templates. Each entry represents a class of indicators for tuples of information. For example, “ht,mt” represents a class of indicator features with one feature for each possible combination of head POS-tag and modifier POStag. Abbreviations: ht = head POS, hw = head word, hc4 = 4-bit prefix of head, hc6 = 6-bit prefix of head, hc* = full bit string of head; mt,mw,mc4,mc6,mc* = likewise for modifier; st,gt,sc4,gc4,. . . = likewise for sibling and grandchild
  • Table2: Parent-prediction accuracies on Sections 0, 1, 23, and 24. Abbreviations: dep1/dep1c = first-order parser with baseline/cluster-based features; dep2/dep2c = second-order parser with baseline/cluster-based features; MD1 = <a class="ref-link" id="cMcdonald_et+al_2005_a" href="#rMcdonald_et+al_2005_a">McDonald et al (2005a</a>); MD2 = <a class="ref-link" id="cMcdonald_2006_a" href="#rMcdonald_2006_a">McDonald and Pereira (2006</a>); suffix -L = labeled parser. Unlabeled parsers are scored using unlabeled parent predictions, and labeled parsers are scored using labeled parent predictions. Improvements of cluster-based features over baseline features are shown in parentheses
  • Table3: Parent-prediction accuracies of unlabeled English parsers on Section 22. Abbreviations: Size = #sentences in training corpus; ∆ = difference between cluster-based and baseline features; other abbreviations are as in Table 2
  • Table4: Parent-prediction accuracies of unlabeled Czech parsers on the PDT 1.0 test set, for baseline features and cluster-based features. Abbreviations are as in Table 2
  • Table5: Unlabeled parent-prediction accuracies of Czech parsers on the PDT 1.0 test set, for our models and for previous work
  • Table6: Parent-prediction accuracies of unlabeled Czech parsers on the PDT 1.0 development set. Abbreviations are as in Table 3
  • Table7: Parent-prediction accuracies of unlabeled English parsers on Section 22. Abbreviations: N = threshold value; other abbreviations are as in Table 2. We did not train cluster-based parsers using threshold values larger than 800 due to computational limitations
  • Table8: Parent-prediction accuracies of unlabeled English parsers on Section 22. Abbreviations: suffix -P = model without POS; other abbreviations are as in Table 2
Download tables as Excel
相关工作
  • As mentioned earlier, our approach was inspired by the success of Miller et al (2004), who demonstrated the effectiveness of using word clusters as features in a discriminative learning approach. Our research, however, applies this technique to dependency parsing rather than named-entity recognition.

    In this paper, we have focused on developing new representations for lexical information. Previous research in this area includes several models which incorporate hidden variables (Matsuzaki et al, 2005; Koo and Collins, 2005; Petrov et al, 2006; Titov and Henderson, 2007). These approaches have the advantage that the model is able to learn different usages for the hidden variables, depending on the target problem at hand. Crucially, however, these methods do not exploit unlabeled data when learning their representations.
基金
  • Terry Koo was funded by NSF grant DMS-0434222 and a grant from NTT, Agmt
  • Xavier Carreras was supported by the Catalan Ministry of Innovation, Universities and Enterprise, and a grant from NTT, Agmt
  • Michael Collins was funded by NSF grants 0347631 and DMS-0434222
引用论文
  • P.F. Brown, V.J. Della Pietra, P.V. deSouza, J.C. Lai, and R.L. Mercer. 1992. Class-Based n-gram Models of Natural Language. Computational Linguistics, 18(4):467–479.
    Google ScholarLocate open access versionFindings
  • S. Buchholz and E. Marsi. 2006. CoNLL-X Shared Task on Multilingual Dependency Parsing. In Proceedings of CoNLL, pages 149–164.
    Google ScholarLocate open access versionFindings
  • X. Carreras. 2007. Experiments with a Higher-Order Projective Dependency Parser. In Proceedings of EMNLP-CoNLL, pages 957–961.
    Google ScholarLocate open access versionFindings
  • E. Charniak, D. Blaheta, N. Ge, K. Hall, and M. Johnson. 2000. BLLIP 1987–89 WSJ Corpus Release 1, LDC No. LDC2000T43. Linguistic Data Consortium.
    Google ScholarFindings
  • Y.J. Chu and T.H. Liu. 196On the shortest arborescence of a directed graph. Science Sinica, 14:1396– 1400.
    Google ScholarLocate open access versionFindings
  • M. Collins, J. Hajic, L. Ramshaw, and C. Tillmann. 1999. A Statistical Parser for Czech. In Proceedings of ACL, pages 505–512.
    Google ScholarLocate open access versionFindings
  • M. Collins. 2002. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms. In Proceedings of EMNLP, pages 1–8.
    Google ScholarLocate open access versionFindings
  • K. Crammer and Y. Singer. 2003. Ultraconservative Online Algorithms for Multiclass Problems. Journal of Machine Learning Research, 3:951–991.
    Google ScholarLocate open access versionFindings
  • K. Crammer, O. Dekel, S. Shalev-Shwartz, and Y. Singer. 2004. Online Passive-Aggressive Algorithms. In S. Thrun, L. Saul, and B. Scholkopf, editors, NIPS 16, pages 1229–1236.
    Google ScholarFindings
  • J. Edmonds. 1967. Optimum branchings. Journal of Research of the National Bureau of Standards, 71B:233– 240.
    Google ScholarLocate open access versionFindings
  • J. Eisner. 2000. Bilexical Grammars and Their CubicTime Parsing Algorithms. In H. Bunt and A. Nijholt, editors, Advances in Probabilistic and Other Parsing Technologies, pages 29–62. Kluwer Academic Publishers.
    Google ScholarLocate open access versionFindings
  • Y. Freund and R. Schapire. 1999. Large Margin Classification Using the Perceptron Algorithm. Machine Learning, 37(3):277–296.
    Google ScholarLocate open access versionFindings
  • J. Hajic, E. Hajicova, P. Pajas, J. Panevova, and P. Sgall. 2001. The Prague Dependency Treebank 1.0, LDC No. LDC2001T10. Linguistics Data Consortium.
    Google ScholarFindings
  • J. Hajic. 1998. Building a Syntactically Annotated Corpus: The Prague Dependency Treebank. In E. Hajicova, editor, Issues of Valency and Meaning. Studies in Honor of Jarmila Panevova, pages 12–19.
    Google ScholarLocate open access versionFindings
  • K. Hall and V. Novak. 2005. Corrective Modeling for Non-Projective Dependency Parsing. In Proceedings of IWPT, pages 42–52.
    Google ScholarLocate open access versionFindings
  • T. Koo and M. Collins. 2005. Hidden-Variable Models for Discriminative Reranking. In Proceedings of HLTEMNLP, pages 507–514.
    Google ScholarLocate open access versionFindings
  • W. Li and A. McCallum. 2005. Semi-Supervised Sequence Modeling with Syntactic Topic Models. In Proceedings of AAAI, pages 813–818.
    Google ScholarLocate open access versionFindings
  • P. Liang. 2005. Semi-Supervised Learning for Natural Language. Master’s thesis, Massachusetts Institute of Technology.
    Google ScholarFindings
  • M.P. Marcus, B. Santorini, and M. Marcinkiewicz. 1993. Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313–330.
    Google ScholarLocate open access versionFindings
  • T. Matsuzaki, Y. Miyao, and J. Tsujii. 2005. Probabilistic CFG with Latent Annotations. In Proceedings of ACL, pages 75–82.
    Google ScholarLocate open access versionFindings
  • D. McClosky, E. Charniak, and M. Johnson. 2006. Effective Self-Training for Parsing. In Proceedings of HLT-NAACL, pages 152–159.
    Google ScholarLocate open access versionFindings
  • R. McDonald and F. Pereira. 2006. Online Learning of Approximate Dependency Parsing Algorithms. In Proceedings of EACL, pages 81–88.
    Google ScholarLocate open access versionFindings
  • R. McDonald, K. Crammer, and F. Pereira. 2005a. Online Large-Margin Training of Dependency Parsers. In Proceedings of ACL, pages 91–98.
    Google ScholarLocate open access versionFindings
  • R. McDonald, F. Pereira, K. Ribarov, and J. Hajic. 2005b. Non-Projective Dependency Parsing using Spanning Tree Algorithms. In Proceedings of HLT-EMNLP, pages 523–530.
    Google ScholarLocate open access versionFindings
  • S. Miller, J. Guinness, and A. Zamanian. 2004. Name Tagging with Word Clusters and Discriminative Training. In Proceedings of HLT-NAACL, pages 337–342.
    Google ScholarLocate open access versionFindings
  • J. Nivre and J. Nilsson. 2005. Pseudo-Projective Dependency Parsing. In Proceedings of ACL, pages 99–106.
    Google ScholarLocate open access versionFindings
  • J. Nivre, J. Hall, S. Kubler, R. McDonald, J. Nilsson, S. Riedel, and D. Yuret. 2007. The CoNLL 2007 Shared Task on Dependency Parsing. In Proceedings of EMNLP-CoNLL 2007, pages 915–932.
    Google ScholarLocate open access versionFindings
  • S. Petrov, L. Barrett, R. Thibaux, and D. Klein. 2006. Learning Accurate, Compact, and Interpretable Tree Annotation. In Proceedings of COLING-ACL, pages 433–440.
    Google ScholarLocate open access versionFindings
  • A. Ratnaparkhi. 1996. A Maximum Entropy Model for Part-Of-Speech Tagging. In Proceedings of EMNLP, pages 133–142.
    Google ScholarLocate open access versionFindings
  • I. Titov and J. Henderson. 2007. Constituent Parsing with Incremental Sigmoid Belief Networks. In Proceedings of ACL, pages 632–639.
    Google ScholarLocate open access versionFindings
  • Q.I. Wang, D. Schuurmans, and D. Lin. 2005. Strictly Lexical Dependency Parsing. In Proceedings of IWPT, pages 152–159.
    Google ScholarLocate open access versionFindings
  • H. Yamada and Y. Matsumoto. 2003. Statistical Dependency Analysis With Support Vector Machines. In Proceedings of IWPT, pages 195–206.
    Google ScholarLocate open access versionFindings
0
您的评分 :

暂无评分

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn