This paper introduces a single high capacity recurrent neural network model which allows chains of reasoning across multiple relation types
Chains of Reasoning over Entities, Relations, and Text using Recurrent Neural Networks.
conference of the european chapter of the association for computational linguistics, (2017)
Our goal is to combine the rich multistep inference of symbolic logical reasoning with the generalization capabilities of neural networks. We are particularly interested in complex reasoning about entities and relations in text and large-scale knowledge bases (KBs). Neelakantan et al. (2015) use RNNs to compose the distributed semantics o...更多
下载 PDF 全文
- There is a rising interest in extending neural networks to perform more complex reasoning, formerly addressed only by symbolic and logical reasoning systems.
- The “matrix completion” mechanism that underlies the common implementation of Universal Schema can be seen as a simple type of reasoning, as can other work in tensor factorization (Nickel et al, 2011; Bordes et al, 2013; Socher et al, 2013)
- These methods can be understood as operating on single pieces of evidence: for example, inferring that Microsoft–located-in–Seattle implies Microsoft– HQ-in–Seattle.
- The Path-RNN model combines all the relations in π sequentially using a RNN with an intermediate representation ht P Rh at step t given by ht “ f pWhr hht1 ` Wirhyrrt q
- There is a rising interest in extending neural networks to perform more complex reasoning, formerly addressed only by symbolic and logical reasoning systems
- I. place.birthpa, bq Ð ‘was born in’pa, xq^ ‘commonly known as’px, bq ii. location.containspa, bq Ð nationality1pa, xq ^ place.birthpx, bq iii. book.characterspa, bq Ð‘aka’pa, xq^ ́1px, bq iv. cause.deathpa, bq Ð‘contracted’pa, bq for populating a knowledge bases (KBs) from text is Universal Schema (Riedel et al, 2013; Verga et al, 2016), which learns vector embeddings of relation types - the union of all input relation types, both from the schemas of multiple structured KBs, as well as expressions of relations in natural language text
- This paper presents multiple modeling advances that significantly increase the accuracy and practicality of recurrent neural network (RNN)-based reasoning on Horn clause chains in large-scale KBs. (1) Previous work, including (Lao et al, 2011; Neelakantan et al, 2015; Guu et al, 2015) reason about chains of relations, but not the entities that form the nodes of the path
- We apply our models to the dataset released by Neelakantan et al (2015), which is a subset of Freebase enriched with information from ClueWeb
- The dataset is comprised of a set of triples (e1, r, e2) and the set of paths connecting the entity pair (e1,e2) in the knowledge graph
- On chains of reasoning in WordNet we reduce error in mean quantile by 84% versus the previous state of the art
- This paper introduces a single high capacity RNN model which allows chains of reasoning across multiple relation types
- The authors apply our models to the dataset released by Neelakantan et al (2015), which is a subset of Freebase enriched with information from ClueWeb. The dataset is comprised of a set of triples (e1, r, e2) and the set of paths connecting the entity pair (e1,e2) in the knowledge graph.
- The triples extracted from ClueWeb consists of sentences that contain entities linked to Freebase (Orr et al, 2013).
- The WordNet dataset has just 22 relation types and 38194 entities which is order of magnitudes less than the dataset the authors use for relation extraction tasks
- This paper introduces a single high capacity RNN model which allows chains of reasoning across multiple relation types.
- It leverages information from the intermediate entities present in the path between an entity pair and mitigates the problem of unseen entities by representing them as a function of their annotated types.
- The authors address the problem of reasoning about infrequently occurring relations and show significant performance gains via multitasking
- Table1: Several highly probable clauses learnt by our model. The textual relations are shown in quotes and italicized. Our model has the ability to combine textual and schema relations. r1 is the inverse relation r, i.e. rpa, bq ô r1pb, aq
- Table2: Statistics of the dataset
- Table3: The first section shows the effectiveness of LogSumExp as the score aggregation function. The next section compares performance with existing multi-hop approaches and the last section shows the performance achieved using joint reasoning with entities and types
- Table4: Model performance when trained with a small fraction of the data
- Table5: Body of two clauses both of which are predictive of location.containspx, yq. First fact is universally true but the truth value of the second clause depends on the value of the entities in the clause. The model without entity parameters cannot discriminate this and outputs a lower overall confidence score
- Table6: Performance on path queries in WordNet
- Two early works on extracting clauses and reasoning over paths are SHERLOCK (Schoenmackers et al, 2010) and the Path Ranking Algorithm (PRA) (Lao et al, 2011). SHERLOCK extracts purely symbolic clauses by exhaustively exploring relational paths of increasing length. PRA replaces exhaustive search by random walks. Observed paths are used as features for a per-targetrelation binary classifier. Lao et al (2012) extend PRA by augmenting KB-schema relations with observed text patterns. However, these methods do not generalize well to millions of distinct paths obtained from random exploration of the KB, since each unique path is treated as a singleton, where no commonalities between paths are modeled. In response, pre-trained vector representations have been used in PRA to tackle the feature explosion (Gardner et al, 2013; Gardner et al, 2014) but still rely on a classifier using atomic path features. Yang et al (2015) also extract horn rules, but they restrict it to a length of 3 and the literals are restricted to schema types in the knowledge base. Zeng et al (2016) show improvements in relation extraction by incorporating sentences which
- On a largescale Freebase+ClueWeb prediction task, we achieve 25% error reduction, and a 53% error reduction on sparse relations
- On chains of reasoning in WordNet we reduce error in mean quantile by 84% versus the previous state of the art.1
- In addition to efficiency advantages, our approach significantly increases accuracy because the multi-task nature of the training shares strength in the common RNN parameters
- In comparison with the previous best on this data, we achieve an error reduction of 25% in mean average precision (MAP)
- In comparison with previous state-of-the-art (Guu et al, 2015) our model achieves a 84% reduction in error in mean quantile
- We achieve the best performance when we represent entities as a function of their annotated types in Freebase (Single-Model + Types) pp ă 0.005q
- Our model achieves a 84% reduction in error when compared to their best model
- Jonathan Berant, Ido Dagan, and Jacob Goldberger. 201Global learning of typed entailment rules. In NAACL.
- Antoine Bordes, Nicolas Usunier, Alberto GarcıaDuran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multirelational data. In NIPS.
- Samuel R. Bowman, Christopher Potts, and Christopher D. Manning. 2014. Recursive neural networks for learning logical semantics. CoRR.
- Rich Caruana. 1997. Multitask learning. Machine Learning.
- Kai-Wei Chang, Wen tau Yih, Bishan Yang, and Christopher Meek. 2014. Typed tensor decomposition of knowledge bases for relation extraction. In EMNLP.
- Matt Gardner, Partha Pratim Talukdar, Bryan Kisiel, and Tom M. Mitchell. 2013. Improving learning and inference in a large knowledge-base using latent syntactic cues. In EMNLP.
- Matt Gardner, Partha Talukdar, Jayant Krishnamurthy, and Tom Mitchell. 2014. Incorporating vector space similarity in random walk inference over knowledge bases. In EMNLP.
- Edward Grefenstette. 2013. Towards a formal distributional semantics: Simulating logical calculi with tensors. Lexical and Computational Semantics.
- K. Guu, J. Miller, and P. Liang. 2015. Traversing knowledge graphs in vector space. In EMNLP.
- Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. CoRR, abs/1412.6980.
- Ni Lao, Tom Mitchell, and William W. Cohen. 20Random walk inference and learning in a large scale knowledge base. In EMNLP, Stroudsburg, PA, USA.
- Ni Lao, Amarnag Subramanya, Fernando Pereira, and William W. Cohen. 20Reading the web with learned syntactic-semantic inference rules. In EMNLP.
- Quoc V. Le, Navdeep Jaitly, and Geoffrey E. Hinton. 2015. A simple way to initialize recurrent networks of rectified linear units. CoRR.
- Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze. 2008. Introduction to Information Retrieval.
- Bonan Min, Ralph Grishman, Li Wan, Chang Wang, and David Gondek. 2013. Distant supervision for relation extraction with an incomplete knowledge base. In NAACL.
- Arvind Neelakantan, Benjamin Roth, and Andrew McCallum. 2015. Compositional vector space models for knowledge base completion. In ACL, Beijing, China.
- Maximilian Nickel, Volker Tresp, and Hans-Peter Kriegel. 2011. A three-way model for collective learning on multi-relational data. In ICML.
- Dave Orr, Amar Subramanya, Evgeniy Gabrilovich, and Michael Ringgaard. 2013. 11 billion clues in 800 million documents: A
- Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. Bpr: Bayesian personalized ranking from implicit feedback. UAI ’09.
- Sebastian Riedel, Limin Yao, Andrew McCallum, and Benjamin M. Marlin. 2013. Relation extraction with matrix factorization and universal schemas. In NAACL.
- Tim Rocktaschel and Sebastian Riedel. 2016. Learning knowledge base inference with neural theorem provers. In AKBC, NAACL.
- Dan Roth and Wen-tau Yih. 2007. Global inference for entity and relation identification via a linear programming formulation. In In Introduction to SRL.
- Stefan Schoenmackers, Oren Etzioni, Daniel S. Weld, and Jesse Davis. 2010. Learning first-order horn clauses from web text. In EMNLP.
- Iulian Vlad Serban, Alberto Garcıa-Duran, Caglar Gulcehre, Sungjin Ahn, Sarath Chandar, Aaron Courville, and Yoshua Bengio. 2016. Generating factoid questions with recurrent neural networks: The 30m factoid question-answer corpus. ACL.
- Sameer Singh, Sebastian Riedel, Brian Martin, Jiaping Zheng, and Andrew McCallum. 2013. Joint inference of entities, relations, and coreference. In AKBC, CIKM.
- Richard Socher, Danqi Chen, Christopher D Manning, and Andrew Ng. 2013. Reasoning with neural tensor networks for knowledge base completion. In NIPS.
- Kristina Toutanova, Danqi Chen, Patrick Pantel, Hoifung Poon, Pallavi Choudhury, and Michael Gamon. 2015. Representing text for joint embedding of text and knowledge bases. In EMNLP, September.
- Kristina Toutanova, Xi Victoria Lin, Scott Wen tau Yih, Hoifung Poon, and Chris Quirk. 2016. Compositional learning of embeddings for relation paths in knowledge bases and text. ACL.
- Patrick Verga, David Belanger, Emma Strubell, Benjamin Roth, and Andrew McCallum. 2016. Multilingual relation extraction using compositional universal schema.
- Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. 2015. Embedding entities and relations for learning and inference in knowledge bases. ICLR.
- Wenyuan Zeng, Yankai Lin, Zhiyuan Liu, and Maosong Sun. 2016. Incorporating relation paths in neural relation extraction. CoRR.