Knowledge-based weak supervision for information extraction of overlapping relations

ACL, pp. 541-550, 2011.

Cited by: 764|Bibtex|Views64
EI
Other Links: dblp.uni-trier.de|dl.acm.org|academic.microsoft.com
Weibo:
Given the set of matches, define Σ to be set of NY Times sentences with two matched phrases, E to be the set of Freebase entities which were mentioned in one or more sentences, ∆ to be the set of Freebase facts whose arguments, e1 and e2 were mentioned in a sentence in Σ, and R t...

Abstract:

Information extraction (IE) holds the promise of generating a large-scale knowledge base from the Web's natural language text. Knowledge-based weak supervision, using structured data to heuristically label a training corpus, works towards this goal by enabling the automated learning of a potentially unbounded number of relation extractors...More

Code:

Data:

0
Introduction
  • This paper presents a novel approach for multi-instance learning with overlapping relations that combines a sentence-level extraction model with a simple, corpus-level component for aggregating the individual facts.
  • The authors apply our model to learn extractors for NY Times text using weak supervision from Freebase.
  • Experiments show that the approach runs quickly and yields surprising gains in accuracy, at both the aggregate and sentence level.
Highlights
  • This paper presents a novel approach for multi-instance learning with overlapping relations that combines a sentence-level extraction model with a simple, corpus-level component for aggregating the individual facts
  • Experiments show that the approach runs quickly and yields surprising gains in accuracy, at both the aggregate and sentence level
  • We present experiments showing that MULTIR outperforms a reimplementation of Riedel et al (2010)’s approach on both aggregate and sentential extractions
  • Given the set of matches, define Σ to be set of NY Times sentences with two matched phrases, E to be the set of Freebase entities which were mentioned in one or more sentences, ∆ to be the set of Freebase facts whose arguments, e1 and e2 were mentioned in a sentence in Σ, and R to be set of relations names used in the facts of ∆
  • While the Riedel et al approach does include a model of which sentences express relations, it makes significant use of aggregate features that are primarily designed to do entity-level relation predictions and has a less detailed model of extractions at the individual sentence level
  • We argue that weak supervision is promising method prime contract no
Methods
  • The authors follow the approach of Riedel et al (2010) for generating weak supervision data, computing features, and evaluating aggregate extraction.
  • Given the set of matches, define Σ to be set of NY Times sentences with two matched phrases, E to be the set of Freebase entities which were mentioned in one or more sentences, ∆ to be the set of Freebase facts whose arguments, e1 and e2 were mentioned in a sentence in Σ, and R to be set of relations names used in the facts of ∆
  • These sets define the weak supervision data
Results
  • The authors are still able to improve performance at both the sentential and aggregate extraction tasks.
Conclusion
  • This model was designed to provide a joint approach where extraction decisions are almost entirely driven by sentence-level reasoning.
  • The authors can train the model so that the Y variables match the facts in the database, treating the Zi as hidden variables that can take any value, as long as they produce the correct aggregate predictions
  • This approach is related to the multi-instance learning approach of Riedel et al (2010), in that both models include sentence-level and aggregate random variables.
  • By using the contents of a database to heuris- and do not necessarily reflect the view of the Air tically label a training corpus, the authors may be able to Force Research Laboratory (AFRL)
Summary
  • Introduction:

    This paper presents a novel approach for multi-instance learning with overlapping relations that combines a sentence-level extraction model with a simple, corpus-level component for aggregating the individual facts.
  • The authors apply our model to learn extractors for NY Times text using weak supervision from Freebase.
  • Experiments show that the approach runs quickly and yields surprising gains in accuracy, at both the aggregate and sentence level.
  • Methods:

    The authors follow the approach of Riedel et al (2010) for generating weak supervision data, computing features, and evaluating aggregate extraction.
  • Given the set of matches, define Σ to be set of NY Times sentences with two matched phrases, E to be the set of Freebase entities which were mentioned in one or more sentences, ∆ to be the set of Freebase facts whose arguments, e1 and e2 were mentioned in a sentence in Σ, and R to be set of relations names used in the facts of ∆
  • These sets define the weak supervision data
  • Results:

    The authors are still able to improve performance at both the sentential and aggregate extraction tasks.
  • Conclusion:

    This model was designed to provide a joint approach where extraction decisions are almost entirely driven by sentence-level reasoning.
  • The authors can train the model so that the Y variables match the facts in the database, treating the Zi as hidden variables that can take any value, as long as they produce the correct aggregate predictions
  • This approach is related to the multi-instance learning approach of Riedel et al (2010), in that both models include sentence-level and aggregate random variables.
  • By using the contents of a database to heuris- and do not necessarily reflect the view of the Air tically label a training corpus, the authors may be able to Force Research Laboratory (AFRL)
Tables
  • Table1: Estimated precision and recall by relation, as well as the number of matched sentences (#sents) and accuracy (% true) of matches between sentences and facts in Freebase
Download tables as Excel
Related work
  • Supervised-learning approaches to IE were introduced in (Soderland et al, 1995) and are too numerous to summarize here. While they offer high precision and recall, these methods are unlikely to scale to the thousands of relations found in text on the Web. Open IE systems, which perform selfsupervised learning of relation-independent extractors (e.g., Preemptive IE (Shinyama and Sekine, 2006), TEXTRUNNER (Banko et al, 2007; Banko and Etzioni, 2008) and WOE (Wu and Weld, 2010)) can scale to millions of documents, but don’t output canonicalized relations.

    8.1 Weak Supervision

    Weak supervision (also known as distant- or self supervision) refers to a broad class of methods, but we focus on the increasingly-popular idea of using a store of structured data to heuristicaly label a textual corpus. Craven and Kumlien (1999) introduced the idea by matching the Yeast Protein Database (YPD) to the abstracts of papers in PubMed and training a naive-Bayes extractor. Bellare and McCallum (2007) used a database of BibTex records to train a CRF extractor on 12 bibliographic relations. The KYLIN system aplied weak supervision to learn relations from Wikipedia, treating infoboxes as the associated database (Wu and Weld, 2007); Wu et al (2008) extended the system to use smoothing over an automatically generated infobox taxonomy. Mintz et al (2009) used Freebase facts to train automatically learn a nearly unbounded number of
Funding
  • This material is based upon work supported by a WRF /
Reference
  • Michele Banko and Oren Etzioni. 2008. The tradeoffs between open and traditional relation extraction. In Proceedings of 46th Annual Meeting of the Association for Computational Linguistics (ACL-08), pages 28–36.
    Google ScholarLocate open access versionFindings
  • Michele Banko, Michael J. Cafarella, Stephen Soderland, Matthew Broadhead, and Oren Etzioni. 2007. Open information extraction from the web. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI-07), pages 2670–2676.
    Google ScholarLocate open access versionFindings
  • Kedar Bellare and Andrew McCallum. 2007. Learning extractors from unlabeled text using relevant databases. In Sixth International Workshop on Information Integration on the Web.
    Google ScholarFindings
  • Razvan Bunescu and Raymond Mooney. 2007. Learning to extract relations from the web using minimal supervision. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL07).
    Google ScholarLocate open access versionFindings
  • Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R. Hruschka Jr., and Tom M. Mitchell. 2010. Toward an architecture for neverending language learning. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI-10).
    Google ScholarLocate open access versionFindings
  • Michael Collins. 2002. Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP-2002).
    Google ScholarLocate open access versionFindings
  • Mark Craven and Johan Kumlien. 1999. Constructing biological knowledge bases by extracting information from text sources. In Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology, pages 77–86.
    Google ScholarLocate open access versionFindings
  • Thomas G. Dietterich, Richard H. Lathrop, and Tomas Lozano-Perez. 1997. Solving the multiple instance problem with axis-parallel rectangles. Artificial Intelligence, 89:31–71, January.
    Google ScholarLocate open access versionFindings
  • Jenny Rose Finkel, Trond Grenager, and Christopher Manning. 2005. Incorporating non-local information into information extraction systems by gibbs sampling. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL05), pages 363–370.
    Google ScholarLocate open access versionFindings
  • Raphael Hoffmann, Congle Zhang, and Daniel S. Weld. 20Learning 5000 relational extractors. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL-10), pages 286–295.
    Google ScholarLocate open access versionFindings
  • Percy Liang, A. Bouchard-Cote, Dan Klein, and Ben Taskar. 2006. An end-to-end discriminative approach to machine translation. In International Conference on
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments