## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# Knowledge Graph Embedding by Translating on Hyperplanes.

AAAI, pp.1112-1119, (2014)

EI

Keywords

Abstract

We deal with embedding a large scale knowledge graph composed of entities and relations into a continuous vector space. TransE is a promising method proposed recently, which is very efficient while achieving state-of-the-art predictive performance. We discuss some mapping properties of relations which should be considered in embedding, su...More

Code:

Data:

Introduction

- Knowledge graphs such as Freebase (Bollacker et al 2008), WordNet (Miller 1995) and GeneOntology (Ashburner et al 2000) have become very important resources to support many AI related applications, such as web/mobile search, Q&A, etc.
- A knowledge graph is a multi-relational graph composed of entities as nodes and relations as different types of edges.
- Two major difficulties are: (1) A knowledge graph is a symbolic and logical system while applications often involve numerical computing in continuous spaces; (2) It is difficult to aggregate global knowledge over a graph.
- Even the embedding representation of a single entity/relation encodes global information from the whole knowledge graph.
- For any candidate triplet (h, r, t), the authors can confirm the correctness by checking the compatibility of the representations h and t under the operation characterized by r

Highlights

- Knowledge graphs such as Freebase (Bollacker et al 2008), WordNet (Miller 1995) and GeneOntology (Ashburner et al 2000) have become very important resources to support many AI related applications, such as web/mobile search, Q&A, etc
- We show that the running time of translation on hyperplanes is comparable to translation based method
- translation on hyperplanes brings promising improvements to translation based method on one-to-many, many-to-one, and many-to-many relations
- On the larger set FB15k, translation based method and translation on hyperplanes are much better than NTN
- We have introduced translation on hyperplanes, a new model to embed a knowledge graph in a continuous vector space
- translation on hyperplanes overcomes the flaws of translation based method concerning the reflexive/one-to-many/many-to-one/many-to-many relations while inheriting its efficiency

Methods

- The authors empirically study and evaluate related methods on three tasks: link prediction (Bordes et al 2013b), triplets classification (Socher et al 2013), and relational fact extraction (Weston et al 2013).
- The authors use the same two data sets which are used in TransE (Bordes et al 2011; 2013b): WN18, a subset of Wordnet; FB15k, a relatively dense subgraph of Freebase where all entities are present in Wikilinks database 1.
- Both are released in (Bordes et al 2013b).

Results

- The results are reported in Table 3.
- The simple models TransE, TransH, and even the naive baseline Unstructured (i.e., TransE without translation) outperform other approaches on WN18 in terms of the Mean metric
- This may be because the number of relations in WN18 is quite small so that it is acceptable to ignore the different types of relations.
- Notice that the number (1,345) of relations of FB15k is much larger than that (13) of FB13 while the number of entities are close
- This means FB13 is a very dense subgraph where strong correlations exist between entities.
- In this case, modeling the complex correlations between entities by tensor and nonlinear

Conclusion

- The authors have introduced TransH, a new model to embed a knowledge graph in a continuous vector space.
- TransH overcomes the flaws of TransE concerning the reflexive/one-to-many/many-to-one/many-to-many relations while inheriting its efficiency.
- Extensive experiments on the tasks of link prediction, triplet classification, and relational fact extraction show that TransH brings promising improvements to TransE.
- The trick of reducing false negative labels proposed in this paper is proven to be effective

Summary

- Knowledge graphs such as Freebase (Bollacker et al 2008), WordNet (Miller 1995) and GeneOntology (Ashburner et al 2000) have become very important resources to support many AI related applications, such as web/mobile search, Q&A, etc.
- TransE (Bordes et al 2013b) represents a relation by a translation vector r so that the pair of embedded entities in a triplet (h, r, t) can be connected by r with low error.
- As introduced in Introduction & Related Work (Table 1), TransE models a relation r as a translation vector r ∈ Rk and assumes the error h + r − t 1/ 2 is low if (h, r, t) is a golden triplet.
- We set different probabilities for replacing the head or tail entity when corrupting the triplet, which depends on the mapping property of the relation, i.e., one-to-many, many-to-one or many-to-many.
- We empirically study and evaluate related methods on three tasks: link prediction (Bordes et al 2013b), triplets classification (Socher et al 2013), and relational fact extraction (Weston et al 2013).
- We follow the same protocol in TransE (Bordes et al 2013b): For each testing triplet (h, r, t), we replace the tail t by every entity e in the knowledge graph and calculate a dissimilarity score on the corrupted triplet (h, r, e) .
- For FB15k not used in (Socher et al 2013), we implement TransE and TransH by ourselves, and use the released code for NTN.
- On all the three data sets, the trick of reducing false negative labeling helps both TransE and TransH.
- Following the same rule of combining the score from knowledge graph embedding with the score from the text side model, we can obtain the precision-recall curves for TransE and TransH, as shown in Figure 2 (a).
- From the figure we can see TransH consistently outperforms TransE as a “prior” model on improving the text side extraction method Sm2r.
- The results in Figure 2 (a) depend on the specific rule of combining the score from knowledge graph embedding with the score from text side model.
- Figure 2 (a) does not clearly demonstrate the separate capability of TransE/TransH as a stand-alone model for relational fact prediction.
- To clearly demonstrate the stand-alone capability of TransE/TransH, we first use the text side model Sm2r to assign each entity pair to the relation with the highest confidence score, keep those facts where the assigned relation is not “NA”.
- Extensive experiments on the tasks of link prediction, triplet classification, and relational fact extraction show that TransH brings promising improvements to TransE.
- The trick of reducing false negative labels proposed in this paper is proven to be effective

- Table1: Different embedding models: the scoring functions fr(h, t) and the model complexity (the number of parameters). ne and nr are the number of unique entities and relations, respectively. It is the often case that nr ne. k is the dimension of embedding space. s is the number of hidden nodes of a neural network or the number of slices of a tensor
- Table2: Data sets used in the experiments
- Table3: Link prediction results
- Table4: Results on FB15k by relation category
- Table5: Hits@10 of TransE and TransH on some examples of one-to-many∗, many-to-one†, many-to-many‡, and reflexive§ relations
- Table6: Triplet classification: accuracies (%). “40h”, “5m” and “30m” in the brackets are the running (wall clock) time

Related work

- The most related work is briefly summarized in Table 1. All these methods embed entities into a vector space and enforce the embedding compatible under a scoring function. Different models differ in the definition of scoring functions fr(h, r) which imply some transformations on h and t.

TransE (Bordes et al 2013b) represents a relation by a translation vector r so that the pair of embedded entities in a triplet (h, r, t) can be connected by r with low error. TransE is very efficient while achieving state-of-theart predictive performance. However, it has flaws in dealing with reflexive/one-to-many/many-to-one/many-to-many relations.

Funding

- Proposes TransH which models a relation as a hyperplane together with a translation operation on it
- Proposes a simple trick to reduce the chance of false negative labeling
- Proposes a model which enables an entity to have distributed representations when involved in different relations

Study subjects and analysis

data sets: 2

Rather than requiring one best answer, this task emphasizes more on ranking a set of candidate entities from the knowledge graph. We use the same two data sets which are used in TransE (Bordes et al 2011; 2013b): WN18, a subset of Wordnet; FB15k, a relatively dense subgraph of Freebase where all entities are present in Wikilinks database 1. Both are released in (Bordes et al 2013b)

data sets: 3

It is used in (Socher et al 2013) to evaluate NTN model. Three data sets are used in this task. Two of them are the same as in NTN (Socher et al 2013): WN11, a subset of WordNet; FB13, a subset of Freebase

data sets: 3

Concerning running time, the cost of NTN is much higher than TransE/TransH. In addition, on all the three data sets, the trick of reducing false negative labeling (the results with “bern.”) helps both TransE and TransH. In NTN (Socher et al 2013), the results of combining it with word embedding (Mikolov et al 2013) are also reported

Reference

- Ashburner, M.; Ball, C. A.; Blake, J. A.; Botstein, D.; Butler, H.; Cherry, J. M.; Davis, A. P.; Dolinski, K.; Dwight, S. S.; Eppig, J. T.; et al. 2000. Gene ontology: Tool for the unification of biology. Nature genetics 25(1):25–29.
- Bollacker, K.; Evans, C.; Paritosh, P.; Sturge, T.; and Taylor, J. 2008. Freebase: A collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, 1247–1250. ACM.
- Bordes, A.; Weston, J.; Collobert, R.; and Bengio, Y. 2011. Learning structured embeddings of knowledge bases. In Proceedings of the 25th AAAI Conference on Artificial Intelligence.
- Bordes, A.; Glorot, X.; Weston, J.; and Bengio, Y. 2012. A semantic matching energy function for learning with multirelational data. Machine Learning 1–27.
- Bordes, A.; Usunier, N.; Garcia-Duran, A.; Weston, J.; and Yakhnenko, O. 2013a. Irreflexive and hierarchical relations as translations. arXiv preprint arXiv:1304.7158.
- Bordes, A.; Usunier, N.; Garcia-Duran, A.; Weston, J.; and Yakhnenko, O. 2013b. Translating embeddings for modeling multi-relational data. In Advances in Neural Information Processing Systems 2Curran Associates, Inc. 2787–2795.
- Chang, K.-W.; Yih, W.-t.; and Meek, C. 2013. Multirelational latent semantic analysis. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 1602–1612.
- Seattle, Washington, USA: Association for Computational Linguistics.
- Collobert, R., and Weston, J. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th Annual International Conference on Machine Learning (ICML 2008), 160–167. Omnipress.
- Finkel, J. R.; Grenager, T.; and Manning, C. 2005. Incorporating non-local information into information extraction systems by gibbs sampling. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, 363–370. Association for Computational Linguistics.
- Hoffmann, R.; Zhang, C.; Ling, X.; Zettlemoyer, L. S.; and Weld, D. S. 20Knowledge-based weak supervision for information extraction of overlapping relations. In Proceedings of the 49th Annual Meeting on Association for Computational Linguistics, 541–550. Association for Computational Linguistics.
- Jenatton, R.; Roux, N. L.; Bordes, A.; and Obozinski, G. R. 20A latent factor model for highly multi-relational data. In Advances in Neural Information Processing Systems 25. Curran Associates, Inc. 3167–3175.
- Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G. S.; and Dean, J. 20Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems 26. Curran Associates, Inc. 3111–3119.
- Miller, G. A. 1995. Wordnet: A lexical database for english. Communications of the ACM 38(11):39–41.
- Mintz, M.; Bills, S.; Snow, R.; and Jurafsky, D. 2009. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2, 1003–1011. Association for Computational Linguistics.
- Nickel, M.; Tresp, V.; and Kriegel, H.-P. 2011. A threeway model for collective learning on multi-relational data. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), ICML ’11, 809–8New York, NY, USA: ACM.
- Riedel, S.; Yao, L.; and McCallum, A. 2010. Modeling relations and their mentions without labeled text. In Machine
- Socher, R.; Chen, D.; Manning, C. D.; and Ng, A. 2013. Reasoning with neural tensor networks for knowledge base completion. In Advances in Neural Information Processing Systems 26. Curran Associates, Inc. 926–934.
- Surdeanu, M.; Tibshirani, J.; Nallapati, R.; and Manning, C. D. 2012. Multi-instance multi-label learning for relation extraction. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 455–465. Association for Computational Linguistics.
- Sutskever, I.; Tenenbaum, J. B.; and Salakhutdinov, R. 2009. Modelling relational data using bayesian clustered tensor factorization. In Advances in Neural Information Processing Systems 22. Curran Associates, Inc. 1821–1828.
- Weston, J.; Bordes, A.; Yakhnenko, O.; and Usunier, N. 2013. Connecting language and knowledge bases with embedding models for relation extraction. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 1366–1371.

Tags

Comments