Improved Relation Extraction with Feature-rich Compositional Embedding Models

Conference on Empirical Methods in Natural Language Processing, pp. 1774-1784, 2015.

Cited by: 1|Views25
EI
Weibo:
We have demonstrated that Feature-rich Compositional Embedding Model alone attains near state-of-the-art performances on several relation extraction tasks, and in combination with traditional feature based loglinear models it obtains state-of-the-art results

Abstract:

Compositional embedding models build a representation (or embedding) for a linguistic structure based on its component word embeddings. We propose a Feature-rich Compositional Embedding Model (FCM) for relation extraction that is expressive, generalizes to new domains, and is easy-to-implement. The key idea is to combine both (unlexical...More

Code:

Data:

Introduction
  • Two common NLP feature types are lexical properties of words and unlexicalized linguistic/structural interactions between words.
  • Prior work on relation extraction has extensively studied how to design such features by combining discrete lexical properties with aspects of a word’s linguistic context.
  • 1https://github.com/mgormley/pacaya its lemma, its morphological features) with aspects of a word’s linguistic context
  • While these help learning, they make generalization to unseen words difficult.
  • Embeddings can capture lexical information, but alone they are insufficient: in state-of-the-art systems, they are used alongside features of the broader linguistic context
Highlights
  • Two common NLP feature types are lexical properties of words and unlexicalized linguistic/structural interactions between words
  • We introduce a compositional model that combines unlexicalized linguistic context and word embeddings for relation extraction, a task in which contextual feature construction plays a major role in generalizing to unseen data
  • While we focus on the relation extraction task, the framework applies to any task that benefits from both embeddings and typical hand-engineered lexical features
  • We find that on all domains the combination baseline + Feature-rich Compositional Embedding Model (FCM) (5) obtains the highest F1 and significantly outperforms the other baselines, yielding the best reported results for this task
  • We have presented FCM, a new compositional model for deriving sentence-level and substructure embeddings from word embeddings
  • We have demonstrated that FCM alone attains near state-of-the-art performances on several relation extraction tasks, and in combination with traditional feature based loglinear models it obtains state-of-the-art results
Methods
  • Methods in Natural Language Processing and

    Computational Natural Language Learning, pages 455– 465.
  • Methods in Natural Language Processing and.
  • Computational Natural Language Learning, pages 455– 465.
  • Association for Computational Linguistics.
  • Joseph Turian, Lev Ratinov, and Yoshua Bengio.
  • Word representations: a simple and general method for semi-supervised learning.
  • In Association for Computational Linguistics, pages 384–394.
  • Christopher Walker, Stephanie Strassel, Julie Medero, and Kazuaki Maeda.
  • ACE 2005 multilingual training corpus.
  • Linguistic Data Consortium, Philadelphia
Results
  • ACE 2005 Despite FCM’s (1) simple feature set, it is competitive with the log-linear baseline (3) on out-of-domain test sets (Table 3).
  • The authors found that fine-tuning of embeddings (2) did not yield improvements on the out-of-domain development set, in contrast to the results below for SemEval
  • The authors suspect this is because fine-tuning allows the model to overfit the training domain, which hurts performance on the unseen ACE test domains.
Conclusion
  • The authors have presented FCM, a new compositional model for deriving sentence-level and substructure embeddings from word embeddings.
  • The authors have demonstrated that FCM alone attains near state-of-the-art performances on several relation extraction tasks, and in combination with traditional feature based loglinear models it obtains state-of-the-art results.
  • The authors' steps in improving FCM focus on enhancements based on task-specific embeddings or loss functions as in Hashimoto et al (2015; dos Santos et al (2015).
  • The authors plan to explore the above applications of FCM in the future
Summary
  • Introduction:

    Two common NLP feature types are lexical properties of words and unlexicalized linguistic/structural interactions between words.
  • Prior work on relation extraction has extensively studied how to design such features by combining discrete lexical properties with aspects of a word’s linguistic context.
  • 1https://github.com/mgormley/pacaya its lemma, its morphological features) with aspects of a word’s linguistic context
  • While these help learning, they make generalization to unseen words difficult.
  • Embeddings can capture lexical information, but alone they are insufficient: in state-of-the-art systems, they are used alongside features of the broader linguistic context
  • Methods:

    Methods in Natural Language Processing and

    Computational Natural Language Learning, pages 455– 465.
  • Methods in Natural Language Processing and.
  • Computational Natural Language Learning, pages 455– 465.
  • Association for Computational Linguistics.
  • Joseph Turian, Lev Ratinov, and Yoshua Bengio.
  • Word representations: a simple and general method for semi-supervised learning.
  • In Association for Computational Linguistics, pages 384–394.
  • Christopher Walker, Stephanie Strassel, Julie Medero, and Kazuaki Maeda.
  • ACE 2005 multilingual training corpus.
  • Linguistic Data Consortium, Philadelphia
  • Results:

    ACE 2005 Despite FCM’s (1) simple feature set, it is competitive with the log-linear baseline (3) on out-of-domain test sets (Table 3).
  • The authors found that fine-tuning of embeddings (2) did not yield improvements on the out-of-domain development set, in contrast to the results below for SemEval
  • The authors suspect this is because fine-tuning allows the model to overfit the training domain, which hurts performance on the unseen ACE test domains.
  • Conclusion:

    The authors have presented FCM, a new compositional model for deriving sentence-level and substructure embeddings from word embeddings.
  • The authors have demonstrated that FCM alone attains near state-of-the-art performances on several relation extraction tasks, and in combination with traditional feature based loglinear models it obtains state-of-the-art results.
  • The authors' steps in improving FCM focus on enhancements based on task-specific embeddings or loss functions as in Hashimoto et al (2015; dos Santos et al (2015).
  • The authors plan to explore the above applications of FCM in the future
Tables
  • Table1: Examples from ACE 2005. In (1) the word “driving” is a strong indicator of the relation ART3 between M1 and M2
  • Table2: Feature sets used in FCM
  • Table3: Comparison of models on ACE 2005 out-of-domain test sets. Baseline + HeadOnly is our reimplementation of the features of <a class="ref-link" id="cNguyen_2014_a" href="#rNguyen_2014_a">Nguyen and Grishman (2014</a>)
  • Table4: Comparison of FCM with previously published results for SemEval 2010 Task 8. The only exception is the DepNN model, which gets better result than FCM on the same embeddings. The task-specific embeddings from (<a class="ref-link" id="cHashimoto_et+al_2015_a" href="#rHashimoto_et+al_2015_a">Hashimoto et al, 2015</a>) leads to the best performance (an improvement of 0.7%). This observa-
  • Table5: Ablation test of FCM on development set
  • Table6: Evaluation of FCMs with different word embeddings on SemEval 2010 Task 8
Download tables as Excel
Related work
  • Compositional Models for Sentences In order to build a representation (embedding) for a sentence based on its component word embeddings and structural information, recent work on compositional models (stemming from the deep learning community) has designed model structures that mimic the structure of the input. For example, these models could take into account the order of the words (as in Convolutional Neural Networks (CNNs)) (Collobert et al, 2011) or build off of an input tree (as in Recursive Neural Networks (RNNs) or the Semantic Matching Energy Function) (Socher et al, 2013b; Bordes et al, 2012).

    While these models work well on sentence-level representations, the nature of their designs also limits them to fixed types of substructures from the annotated sentence, such as chains for CNNs and trees for RNNs. Such models cannot capture arbitrary combinations of linguistic annotations available for a given task, such as word order, dependency tree, and named entities used for relation extraction. Moreover, these approaches ignore the differences in functions between words appearing in different roles. This does not suit more general substructure labeling tasks in NLP, e.g. these models cannot be directly applied to relation extraction since they will output the same result for any pair of entities in a same sentence.
Funding
  • Mo Yu is supported by the China Scholarship Council and by NSFC 61173073
Study subjects and analysis
people: 284
M1 a man the southern suburbs the united states. M2 a taxicab Baghdad 284 people. Sentence Snippet A man driving what appeared to be a taxicab direction of the southern suburbs of Baghdad in the united states , 284 people died

people: 284
M2 a taxicab Baghdad 284 people. Sentence Snippet A man driving what appeared to be a taxicab direction of the southern suburbs of Baghdad in the united states , 284 people died. In (1) the word “driving” is a strong indicator of the relation ART3 between M1 and M2

Reference
  • Yonatan Belinkov, Tao Lei, Regina Barzilay, and Amir Globerson. 2014. Exploring compositional architectures and word vector representations for prepositional phrase attachment. Transactions of the Association for Computational Linguistics, 2:561–572.
    Google ScholarLocate open access versionFindings
  • Antoine Bordes, Xavier Glorot, Jason Weston, and Yoshua Bengio. 201A semantic matching energy function for learning with multi-relational data. Machine Learning, pages 1–27.
    Google ScholarLocate open access versionFindings
  • Massimiliano Ciaramita and Yasemin Altun. 2006. Broad-coverage sense disambiguation and information extraction with a supersense sequence tagger. In EMNLP2006, pages 594–602, July.
    Google ScholarLocate open access versionFindings
  • Ronan Collobert, Jason Weston, Leon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. JMLR, 12:2493–2537.
    Google ScholarLocate open access versionFindings
  • Cicero dos Santos, Bing Xiang, and Bowen Zhou. 201Classifying relations by ranking with convolutional neural networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 626–634, Beijing, China, July. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Kazuma Hashimoto, Pontus Stenetorp, Makoto Miwa, and Yoshimasa Tsuruoka. 2015. Task-oriented learning of word embeddings for semantic relation classification. arXiv preprint arXiv:1503.00095.
    Findings
  • Iris Hendrickx, Su Nam Kim, Zornitsa Kozareva, Preslav Nakov, Diarmuid O Seaghdha, Sebastian Pado, Marco Pennacchiotti, Lorenza Romano, and Stan Szpakowicz. 2010. Semeval-2010 task 8: Multi-way classification of semantic relations between pairs of nominals. In Proceedings of SemEval-2 Workshop.
    Google ScholarLocate open access versionFindings
  • Karl Moritz Hermann and Phil Blunsom. 2013. The role of syntax in vector space models of compositional semantics. In Association for Computational Linguistics, pages 894–904.
    Google ScholarLocate open access versionFindings
  • Karl Moritz Hermann, Dipanjan Das, Jason Weston, and Kuzman Ganchev. 2014. Semantic frame identification with distributed word representations. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1448–1458, Baltimore, Maryland, June. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Heng Ji, Ralph Grishman, Hoa Trang Dang, Kira Griffitt, and Joe Ellis. 20Overview of the tac 2010 knowledge base population track. In Third Text Analysis Conference (TAC 2010).
    Google ScholarFindings
  • Terry Koo, Xavier Carreras, and Michael Collins. 2008. Simple semi-supervised dependency parsing. In Proceedings of ACL-08: HLT, pages 595–603, Columbus, Ohio, June. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Qi Li and Heng Ji. 2014. Incremental joint extraction of entity mentions and relations. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 402–412, Baltimore, Maryland, June. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Yang Liu, Furu Wei, Sujian Li, Heng Ji, Ming Zhou, and Houfeng WANG. 2015. A dependency-based neural network for relation classification. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 285– 290, Beijing, China, July. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Mingbo Ma, Liang Huang, Bowen Zhou, and Bing Xiang. 2015. Dependency-based convolutional neural networks for sentence embedding. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 174–179, Beijing, China, July. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP natural language processing toolkit. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 55–60.
    Google ScholarLocate open access versionFindings
  • Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. arXiv preprint arXiv:1310.4546.
    Findings
  • Scott Miller, Jethran Guinness, and Alex Zamanian. 2004. Name tagging with word clusters and discriminative training. In Susan Dumais, Daniel Marcu, and Salim Roukos, editors, HLT-NAACL 2004: Main Proceedings. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Alexis Mitchell, Stephanie Strassel, Shudong Huang, and Ramez Zakhary. 2005. Ace 2004 multilingual training corpus. Linguistic Data Consortium, Philadelphia.
    Google ScholarFindings
  • Andriy Mnih and Geoffrey Hinton. 2007. Three new graphical models for statistical language modelling. In Proceedings of the 24th international conference on Machine learning, pages 641–648. ACM.
    Google ScholarLocate open access versionFindings
  • Thien Huu Nguyen and Ralph Grishman. 2014. Employing word representations and regularization for domain adaptation of relation extraction. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 68–74, Baltimore, Maryland, June. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Thien Huu Nguyen and Ralph Grishman. 2015. Relation extraction: Perspective from convolutional neural networks. In Proceedings of NAACL Workshop on Vector Space Modeling for NLP.
    Google ScholarLocate open access versionFindings
  • Thien Huu Nguyen, Barbara Plank, and Ralph Grishman. 2015. Semantic representations for domain adaptation: A case study on the tree kernelbased method for relation extraction. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 635–644, Beijing, China, July. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Robert Parker, David Graff, Junbo Kong, Ke Chen, and Kazuaki Maeda. 2011. English gigaword fifth edition, june. Linguistic Data Consortium, LDC2011T07.
    Google ScholarLocate open access versionFindings
  • Barbara Plank and Alessandro Moschitti. 2013. Embedding semantic similarity in tree kernels for domain adaptation of relation extraction. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1498–1507, Sofia, Bulgaria, August. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Bryan Rink and Sanda Harabagiu. 2010. Utd: Classifying semantic relations by combining lexical and semantic resources. In Proceedings of the 5th International Workshop on Semantic Evaluation, pages 256–259, Uppsala, Sweden, July. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Michael Roth and Kristian Woodsend. 2014. Composition of word representations improves semantic role labelling. In EMNLP.
    Google ScholarFindings
  • Richard Socher, Brody Huval, Christopher D. Manning, and Andrew Y. Ng. 2012. Semantic compositionality through recursive matrix-vector spaces. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 1201–1211, Jeju Island, Korea, July. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Richard Socher, John Bauer, Christopher D Manning, and Andrew Y Ng. 2013a. Parsing with compositional vector grammars. In In Proceedings of the ACL conference. Citeseer.
    Google ScholarLocate open access versionFindings
  • Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher Potts. 2013b. Recursive deep models for semantic compositionality over a sentiment treebank. In Empirical Methods in Natural Language Processing, pages 1631–1642.
    Google ScholarLocate open access versionFindings
  • Ang Sun, Ralph Grishman, and Satoshi Sekine. 2011. Semi-supervised relation extraction with large-scale word clustering. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 521–529, Portland, Oregon, USA, June. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Mihai Surdeanu, Julie Tibshirani, Ramesh Nallapati, and Christopher D Manning. 2012. Multi-instance multi-label learning for relation extraction. In Proceedings of the 2012 Joint Conference on Empirical
    Google ScholarLocate open access versionFindings
  • Methods in Natural Language Processing and Computational Natural Language Learning, pages 455– 465. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Joseph Turian, Lev Ratinov, and Yoshua Bengio. 2010. Word representations: a simple and general method for semi-supervised learning. In Association for Computational Linguistics, pages 384–394.
    Google ScholarLocate open access versionFindings
  • Christopher Walker, Stephanie Strassel, Julie Medero, and Kazuaki Maeda. 2006. ACE 2005 multilingual training corpus. Linguistic Data Consortium, Philadelphia.
    Google ScholarLocate open access versionFindings
  • Mo Yu and Mark Dredze. 2015. Learning composition models for phrase embeddings. Transactions of the Association for Computational Linguistics, 3:227– 242.
    Google ScholarLocate open access versionFindings
  • Mo Yu, Matthew R. Gormley, and Mark Dredze. 2015. Combining word embeddings and feature embeddings for fine-grained relation extraction. In Proceedings of NAACL.
    Google ScholarLocate open access versionFindings
  • Daojian Zeng, Kang Liu, Siwei Lai, Guangyou Zhou, and Jun Zhao. 2014. Relation classification via convolutional deep neural network. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pages 2335–2344, Dublin, Ireland, August. Dublin City University and Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • GuoDong Zhou, Jian Su, Jie Zhang, and Min Zhang. 2005. Exploring various knowledge in relation extraction. In Association for Computational Linguistics, pages 427–434.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments