Easing Embedding Learning by Comprehensive Transcription of Heterogeneous Information Networks

KDD, pp. 2190-2199, 2018.

Cited by: 44|Bibtex|Views41|Links
EI
Keywords:
edge representationnetwork embeddinglogistic regressionheterogeneous information networkcomprehensive transcriptionMore(11+)
Wei bo:
We studied the problem of the comprehensive transcription of Heterogeneous information networks in embedding learning, which preserves the rich information in Heterogeneous information networks and provides an easy-to-use approach to unleash the power of Heterogeneous information...

Abstract:

Heterogeneous information networks (HINs) are ubiquitous in real-world applications. In the meantime, network embedding has emerged as a convenient tool to mine and learn from networked data. As a result, it is of interest to develop HIN embedding methods. However, the heterogeneity in HINs introduces not only rich information but also po...More

Code:

Data:

0
Introduction
  • Heterogeneous information networks (HINs) have received increasing a ention in the past decade due to its ubiquity and capability of representing rich information [21, 25].
  • Network embedding learns low-dimensional vector representations for nodes to encode their semantic information in the original network.
  • The authors study the problem of comprehensive transcription of heterogeneous information networks, which purely aims to transcribe the rich and potentially incompatible information from HINs to the embeddings, without involving additional expertise, feature engineering, or installation of supervision
Highlights
  • Heterogeneous information networks (HINs) have received increasing a ention in the past decade due to its ubiquity and capability of representing rich information [21, 25]
  • We summarize our contributions as follows: (1) We propose to study the problem of comprehensive transcription of Heterogeneous information networks in embedding learning, which preserves the rich information in Heterogeneous information networks and provides an easy-to-use approach to unleash the power of Heterogeneous information networks
  • (2) We identify that di erent extents of semantic incompatibility exist in real-world Heterogeneous information networks, which pose challenges to the comprehensive transcription of Heterogeneous information networks
  • With the availability of edge representations proposed this paper, future works include exploration of more loss functions over edge representation, such as regression to model edges associated with ratings on Heterogeneous information networks that have user-item reviews, or so -max to model Heterogeneous information networks where at most one edge type can exist between a pair of node types
Methods
  • To provide an general-purpose, easy-to-use solution to HIN embedding, the authors describe the HEER model where HEER stands for Heterogeneous Information Network Embedding via Edge Representations.
  • A erward, the model inference method is described subsequently.
  • With the use of edge representation, the authors expect the embedding to infer the existence and the type of edge between each pair of nodes.
  • For edge type r ∈ R, the authors formulate the typed closeness of node pair (u, ) atop their edge embedding gu as sr (u, ) exp ̃∈ Pur ∗ exp +.
Results
  • It can be seen that over 95% of nodes have a generalized Jaccard coe cient smaller than 5e−5 between authorship and publishing year, while less than 25% of nodes fall in the same category when it comes to authorship vs. term usage.
Conclusion
  • CONCLUSION AND FUTURE

    WORKS

    The authors studied the problem of the comprehensive transcription of HINs in embedding learning, which preserves the rich information in HINs and provides an easy-to-use approach to unleash the power of HINs.
  • Experiments and in-depth case studies with large real-world datasets demonstrate the e ectiveness of HEER and the utility of edge representations and heterogeneous metrics.
  • The authors leave the exploration of this direction to future works.
  • It is worthy of studying further boost the performance of HEER by incorporating higher-order structures such as network motifs, while retaining the advantage of HEER for being able to preserve the rich semantics from HINs
Summary
  • Introduction:

    Heterogeneous information networks (HINs) have received increasing a ention in the past decade due to its ubiquity and capability of representing rich information [21, 25].
  • Network embedding learns low-dimensional vector representations for nodes to encode their semantic information in the original network.
  • The authors study the problem of comprehensive transcription of heterogeneous information networks, which purely aims to transcribe the rich and potentially incompatible information from HINs to the embeddings, without involving additional expertise, feature engineering, or installation of supervision
  • Methods:

    To provide an general-purpose, easy-to-use solution to HIN embedding, the authors describe the HEER model where HEER stands for Heterogeneous Information Network Embedding via Edge Representations.
  • A erward, the model inference method is described subsequently.
  • With the use of edge representation, the authors expect the embedding to infer the existence and the type of edge between each pair of nodes.
  • For edge type r ∈ R, the authors formulate the typed closeness of node pair (u, ) atop their edge embedding gu as sr (u, ) exp ̃∈ Pur ∗ exp +.
  • Results:

    It can be seen that over 95% of nodes have a generalized Jaccard coe cient smaller than 5e−5 between authorship and publishing year, while less than 25% of nodes fall in the same category when it comes to authorship vs. term usage.
  • Conclusion:

    CONCLUSION AND FUTURE

    WORKS

    The authors studied the problem of the comprehensive transcription of HINs in embedding learning, which preserves the rich information in HINs and provides an easy-to-use approach to unleash the power of HINs.
  • Experiments and in-depth case studies with large real-world datasets demonstrate the e ectiveness of HEER and the utility of edge representations and heterogeneous metrics.
  • The authors leave the exploration of this direction to future works.
  • It is worthy of studying further boost the performance of HEER by incorporating higher-order structures such as network motifs, while retaining the advantage of HEER for being able to preserve the rich semantics from HINs
Tables
  • Table1: Basic statistics for the DBLP and YAGO networks
  • Table2: Per-edge-type, micro-average, and macro-average MRR achieved by each model in the edge reconstruction task
Download tables as Excel
Related work
  • Homogeneous network embedding. Meanwhile, network embedding has emerged as an e cient and e ective representation learning approach for networked data [4, 7, 9, 16, 18, 19, 19, 29, 32, 37], which signi cantly spares the labor and sources in transforming networks into features that are more machine-actionable. Early network embedding algorithms start from handling the simple, homogeneous networks, and many of them trace to the skip-gram model [13] that aims to learn word representations where words with similar context have similar representation [7, 18, 19, 29].

    Besides skip-gram, algorithms for preserving certain other homogeneous network properties have also been studied [10, 15, 16, 31,32,33].

    e use of edge representations for homogeneous network embedding is discussed in a recent work [1], but such edge representations are designed to distinguish the direction of an edge, instead of encoding richer semantics such as edge type in our case.

    Heterogeneous network embedding. Heterogeneous information network (HIN) has been extensively studied since the past decade for its ubiquity in real-world data and e cacy in ful lling tasks, such as classi cation, clustering, recommendation, and outlier detection [21, 25, 27, 34, 38]. To marry the advantages of HIN and network embedding, a couple of algorithms have been proposed very recently for embedding learning in heterogeneous information networks [2, 5, 6, 8, 20, 23, 28]. One line of work rst uses human expertise or supervision to select meta-paths for a given task or limit the scope of candidate meta-paths, and then proposes methods to transfer the semantics encoded in meta-paths to the learned embedding [5, 6, 20]. While this direction has been showed to be e ective in solving problems that t the semantics of the chosen meta-paths, it di ers from the research scope of ours because they mostly focus on providing quality representations for downstream tasks concerning the node types on the two ends of chosen meta-paths, while we aim at developing methods to transcribe the entire HIN to embeddings as comprehensively as possible. Beyond meta-paths, some approaches have been proposed to embed speci c kinds of HINs [8, 28] with speci c objectives such as representing event data or learning predictive text embeddings. Some other approaches study HINs with additional side information [2] that cannot be generalized to all HINs. Besides, all of these approaches embed the input HIN into only one metric space. Embedding in the context of HIN has also been studied for tasks with additional supervision [3, 11, 17]. ese methods either yield features speci c to given tasks, and are outside of the scope of unsupervised HIN embedding that we study.
Funding
  • Acknowledgments. is work was sponsored in part by U.S Army Research Lab. under Cooperative Agreement No W911NF09-2-0053 (NSCTA), DARPA under Agreement No W911NF-17-C0099, National Science Foundation IIS 16-18481, IIS 17-04532, and IIS-17-41317, DTRA HDTRA11810026, and grant 1U54GM114838 awarded by NIGMS through funds provided by the trans-NIH Big Data to Knowledge (BD2K) initiative (www.bd2k.nih.gov)
Reference
  • Sami Abu-El-Haija, Bryan Perozzi, and Rami Al-Rfou. 2017. Learning Edge Representations via Low-Rank Asymmetric Projections. In CIKM.
    Google ScholarFindings
  • Shiyu Chang, Wei Han, Jiliang Tang, Guo-Jun Qi, Charu C Aggarwal, and omas S Huang. 2015. Heterogeneous network embedding via deep architectures. In KDD.
    Google ScholarFindings
  • Ting Chen and Yizhou Sun. 2017. Task-Guided and Path-Augmented Heterogeneous Network Embedding for Author Identi cation. In WSDM.
    Google ScholarFindings
  • Hanjun Dai, Bo Dai, and Le Song. 2016. Discriminative embeddings of latent variable models for structured data. In ICML.
    Google ScholarFindings
  • Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable representation learning for heterogeneous networks. In KDD.
    Google ScholarFindings
  • Tao-yang Fu, Wang-Chien Lee, and Zhen Lei. 2017. HIN2Vec: Explore Metapaths in Heterogeneous Information Networks for Representation Learning. In CIKM.
    Google ScholarFindings
  • Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In KDD.
    Google ScholarFindings
  • Huan Gui, Jialu Liu, Fangbo Tao, Meng Jiang, Brandon Norick, and Jiawei Han. 2016. Large-scale embedding learning in heterogeneous event data. In ICDM.
    Google ScholarFindings
  • William L Hamilton, Rex Ying, and Jure Leskovec. 2017. Representation Learning on Graphs: Methods and Applications. arXiv preprint arXiv:1709.05584 (2017).
    Findings
  • omas N Kipf and Max Welling. 2017. Semi-supervised classi cation with graph convolutional networks. In ICLR.
    Google ScholarFindings
  • Zemin Liu, Vincent W Zheng, Zhou Zhao, Fanwei Zhu, Kevin Chen-Chuan Chang, Minghui Wu, and Jing Ying. 2017. Semantic Proximity Search on Heterogeneous Graph by Proximity Embedding. In AAAI.
    Google ScholarFindings
  • Laurens van der Maaten and Geo rey Hinton. 2008. Visualizing data using t-SNE. JMLR 9, Nov (2008), 2579–2605.
    Google ScholarLocate open access versionFindings
  • Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Je Dean. 2013.
    Google ScholarFindings
  • Maximillian Nickel and Douwe Kiela. 2017. Poincareembeddings for learning hierarchical representations. In Advances in Neural Information Processing Systems. 6341–6350.
    Google ScholarLocate open access versionFindings
  • Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. 2016. Learning convolutional neural networks for graphs. In ICML.
    Google ScholarFindings
  • Mingdong Ou, Peng Cui, Jian Pei, Ziwei Zhang, and Wenwu Zhu. 20Asymmetric Transitivity Preserving Graph Embedding. In KDD. 1105–1114.
    Google ScholarFindings
  • Shirui Pan, Jia Wu, Xingquan Zhu, Chengqi Zhang, and Yang Wang. 2016. Triparty deep network representation. In IJCAI.
    Google ScholarFindings
  • Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In KDD.
    Google ScholarFindings
  • Leonardo FR Ribeiro, Pedro HP Saverese, and Daniel R Figueiredo. 2017. struc2vec: Learning node representations from structural identity. In KDD.
    Google ScholarFindings
  • Jingbo Shang, Meng, Jialu Liu, Lance M Kaplan, Jiawei Han, and Jian Peng.
    Google ScholarFindings
  • 2016. Meta-Path Guided Embedding for Similarity Search in Large-Scale Heterogeneous Information Networks. arXiv preprint arXiv:1610.09769 (2016).
    Findings
  • [21] Chuan Shi, Yitong Li, Jiawei Zhang, Yizhou Sun, and S Yu Philip. 2017. A survey of heterogeneous information network analysis. TKDE 29, 1 (2017), 17–37.
    Google ScholarLocate open access versionFindings
  • [22] Yu Shi, Po-Wei Chan, Honglei Zhuang, Huan Gui, and Jiawei Han. 2017. PReP: Path-Based Relevance from a Probabilistic Perspective in Heterogeneous Information Networks. In KDD.
    Google ScholarFindings
  • [23] Yu Shi, Huan Gui, Qi Zhu, Lance Kaplan, and Jiawei Han. 2018. AspEm: Embedding Learning by Aspects in Heterogeneous Information Networks.. In SDM.
    Google ScholarLocate open access versionFindings
  • [24] Fabian M Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. Yago: a core of semantic knowledge. In WWW.
    Google ScholarFindings
  • [25] Yizhou Sun and Jiawei Han. 2013. Mining heterogeneous information networks: a structural analysis approach. SIGKDD Explorations 14, 2 (2013), 20–28.
    Google ScholarLocate open access versionFindings
  • [26] Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S Yu, and Tianyi Wu. 2011. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. In VLDB.
    Google ScholarFindings
  • [27] Yizhou Sun, Yintao Yu, and Jiawei Han. 2009. Ranking-based clustering of heterogeneous information networks with star network schema. In KDD.
    Google ScholarFindings
  • [28] Jian Tang, Meng, and Qiaozhu Mei. 2015. PTE: Predictive text embedding through large-scale heterogeneous text networks. In KDD. ACM.
    Google ScholarFindings
  • [29] Jian Tang, Meng, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. In WWW.
    Google ScholarFindings
  • [30] Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. 2008. Arnetminer: extraction and mining of academic social networks. In KDD.
    Google ScholarFindings
  • [31] Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2018. Graph A ention Networks. In ICLR.
    Google ScholarFindings
  • [32] Daixin Wang, Peng Cui, and Wenwu Zhu. 2016. Structural deep network embedding. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining.
    Google ScholarLocate open access versionFindings
  • [33] Linchuan Xu, Xiaokai Wei, Jiannong Cao, and Philip S Yu. 2018. On Exploring Semantic Meanings of Links for Embedding Social Networks. In WWW.
    Google ScholarFindings
  • [34] Xiao Yu, Xiang Ren, Yizhou Sun, anquan Gu, Bradley Sturt, Urvashi Khandelwal, Brandon Norick, and Jiawei Han. 2014. Personalized entity recommendation: A heterogeneous information network approach. In WSDM. ACM.
    Google ScholarFindings
  • [35] Baichuan Zhang and Mohammad Al Hasan. 2017. Name Disambiguation in Anonymized Graphs using Network Embedding. In CIKM.
    Google ScholarFindings
  • [36] Chao Zhang, Liyuan Liu, Dongming Lei, an Yuan, Honglei Zhuang, Tim Hanra y, and Jiawei Han. 2017. TrioVecEvent: Embedding-Based Online Local Event Detection in Geo-Tagged Tweet Streams. In KDD. ACM.
    Google ScholarLocate open access versionFindings
  • [37] Muhan Zhang and Yixin Chen. Weisfeiler-Lehman neural machine for link prediction. In KDD. 575–583.
    Google ScholarLocate open access versionFindings
  • [38] Honglei Zhuang, Jing Zhang, George Brova, Jie Tang, Hasan Cam, Xifeng Yan, and Jiawei Han. 2014. Mining query-based subnetwork outliers in heterogeneous information networks. In ICDM.
    Google ScholarFindings
Your rating :
0

 

Tags
Comments