AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
Extensive experiments were conducted on a large-scale Yelp heterogenous network, and our Projected Metric Embedding model model significantly outperforms the state-of-art heterogenous network embedding methods

PME: Projected Metric Embedding on Heterogeneous Networks for Link Prediction.

KDD, pp.1177-1186, (2018)

被引用64|浏览84
EI
下载 PDF 全文
引用
微博一下

摘要

Heterogenous information network embedding aims to embed heterogenous information networks (HINs) into low dimensional spaces, in which each vertex is represented as a low-dimensional vector, and both global and local network structures in the original space are preserved. However, most of existing heterogenous information network embeddi...更多

代码

数据

0
简介
  • In the era of Big Data, large-scale information networks are becoming ubiquitous in the real world, such as social networks, publication networks, E-commerce information networks and knowledge base graphs.
  • For large-scale information networks, the traditional graph-based representation poses a great challenge to numerous applications that search and mine information in them such as link prediction, node classification, clustering, and recommendation [33,34,35,36,37,38], due to the high computational complexity [8]
  • This motivates a lot of research interests [8] in network embedding techniques that aim to embed information networks into low dimensional vector spaces, in which every vertex is represented as a low-dimensional vector.
  • Various search and mining tasks can be efficiently done in the embedded space with the help of off-the-shelf multidimensional indexing approaches and machine learning techniques for vector spaces
重点内容
  • In the era of Big Data, large-scale information networks are becoming ubiquitous in the real world, such as social networks, publication networks, E-commerce information networks and knowledge base graphs
  • This reflects our model has good adaptability when dealing with data sparsity that is the nature of real-world heterogenous information networks (HINs)
  • The reason behind the superiority is that our Projected Metric Embedding model (PME) model leverages a more geometrically flexible way to capture both the first-order and second-order proximity among nodes simultaneously
  • We proposed a novel model PME to embed heterogenous information networks, which elegantly solves the challenging problem of modelling node and link heterogenities in elaborately designed relation-specific spaces
  • To optimize the PME model, we introduce a novel loss-aware adaptive positive sampling strategy to overcome the heavy skewness of the heterogenous link distribution w.r.t. relations and further improve the model convergence speed
  • Extensive experiments were conducted on a large-scale Yelp heterogenous network, and our PME model significantly outperforms the state-of-art heterogenous network embedding methods
方法
  • The authors compare the proposed model with the following recent embedding methods for heterogeneous networks:

    metapath2vec [9] metapath2vec leverages predefined metapath [23] guided random walks to construct the heterogeneous neighbourhood of a node and applies a heterogeneous skip-gram model to perform node embedding.
  • The authors extend the EOE model by constructing bi-partite heterogenous networks and treating them as homogenous networks
结果
  • The authors report the experimental results regarding social link prediction accuracy and binary link classification.

    5.4.1 Social Link Predication Accuracy.
  • This reflects the model has good adaptability when dealing with data sparsity that is the nature of real-world HINs. The reason behind the superiority is that the PME model leverages a more geometrically flexible way to capture both the first-order and second-order proximity among nodes simultaneously.
结论
  • The authors proposed a novel model PME to embed heterogenous information networks, which elegantly solves the challenging problem of modelling node and link heterogenities in elaborately designed relation-specific spaces.
  • To optimize the PME model, the authors introduce a novel loss-aware adaptive positive sampling strategy to overcome the heavy skewness of the heterogenous link distribution w.r.t. relations and further improve the model convergence speed.
  • Extensive experiments were conducted on a large-scale Yelp heterogenous network, and the PME model significantly outperforms the state-of-art heterogenous network embedding methods
总结
  • Introduction:

    In the era of Big Data, large-scale information networks are becoming ubiquitous in the real world, such as social networks, publication networks, E-commerce information networks and knowledge base graphs.
  • For large-scale information networks, the traditional graph-based representation poses a great challenge to numerous applications that search and mine information in them such as link prediction, node classification, clustering, and recommendation [33,34,35,36,37,38], due to the high computational complexity [8]
  • This motivates a lot of research interests [8] in network embedding techniques that aim to embed information networks into low dimensional vector spaces, in which every vertex is represented as a low-dimensional vector.
  • Various search and mining tasks can be efficiently done in the embedded space with the help of off-the-shelf multidimensional indexing approaches and machine learning techniques for vector spaces
  • Methods:

    The authors compare the proposed model with the following recent embedding methods for heterogeneous networks:

    metapath2vec [9] metapath2vec leverages predefined metapath [23] guided random walks to construct the heterogeneous neighbourhood of a node and applies a heterogeneous skip-gram model to perform node embedding.
  • The authors extend the EOE model by constructing bi-partite heterogenous networks and treating them as homogenous networks
  • Results:

    The authors report the experimental results regarding social link prediction accuracy and binary link classification.

    5.4.1 Social Link Predication Accuracy.
  • This reflects the model has good adaptability when dealing with data sparsity that is the nature of real-world HINs. The reason behind the superiority is that the PME model leverages a more geometrically flexible way to capture both the first-order and second-order proximity among nodes simultaneously.
  • Conclusion:

    The authors proposed a novel model PME to embed heterogenous information networks, which elegantly solves the challenging problem of modelling node and link heterogenities in elaborately designed relation-specific spaces.
  • To optimize the PME model, the authors introduce a novel loss-aware adaptive positive sampling strategy to overcome the heavy skewness of the heterogenous link distribution w.r.t. relations and further improve the model convergence speed.
  • Extensive experiments were conducted on a large-scale Yelp heterogenous network, and the PME model significantly outperforms the state-of-art heterogenous network embedding methods
表格
  • Table1: Yelp network statistics
  • Table2: Statistics on AZ network
  • Table3: Predication accuracy in terms of MRR
  • Table4: AUC scores on NV network
Download tables as Excel
相关工作
  • We first introduce the related methods of general network embedding, and then discuss the recent works on heterogeneous network embedding.

    2.1 Network Embedding

    Originally, graph or network embedding methods were proposed as tools of dimension reduction for network features, such as linear methods based on SVD [27], multi-dimensional scaling (MDS) [39], IsoMap [2] , Spectral clustering [17] and Laplacian Eigenmap [29]. The ideas behind those methods are to learn low dimensional latent factors that can preserve the majority of network features. However, these methods are not applicable for current large information networks because of their low efficiency and large computational complexity. Another graph embedding method called graph factorization [1] works out the low dimensional latent embeddings of a large graph through Matrix Factorization by utilizing network edges. It presents graphs as matrices where matrix elements correspond to edges between vertices. However, the graph factorization methods only preserve linkage information of directly linked nodes so it is insufficient for leaning the high-order proximity of a network. Moreover, representation learning on knowledge graphs is also related to our work. The representative methods such as [4] and Trans-family models (TransE [3], TransH [30], TransR[15]) have been shown effective for modelling knowledge bases. Our idea of building projection matrices for different relations is inspired by TransR but designed for different purposes (to alleviate geometric inflexibility when performing metric learning). Recently, With the advances in language modelling [16], skip-gram algorithm shows its superiority in modelling sentences by capturing the neighbour words concurrencies. Inspired by this idea, DeepWalk [18] was proposed to embed network structures by using local information obtained from truncated random walks as the equivalence of sentences. Along this line of research, node2vec [11] is another representative method. Besides, LINE [26] was proposed as an efficient network embedding method, has shown its robustness and effectiveness in dealing with large-scale information networks. Although it is proposed to be able to preserve both local and global proximity of the network vertices, it didn’t consider the heterogeneity of complex information network.
基金
  • This work was supported by ARC Discovery Early Career Researcher Award (Grant No DE160100308), ARC Discovery Project (Grant No DP170103954 and Grant No DP160104075) and New Staff Research Grant of The University of Queensland (Grant No.613134)
  • It was also partially supported by National Natural Science Foundation of China (Grant No.61572335)
研究对象与分析
users: 162345
Impressively, our model shows its superiority more significantly when the network is more sparse. For example, there are 162,345 users in our AZ dataset, which forms very sparse useruser (only 1,518,610 links, sparsity level 99.994%). Our model gains 3.6x, 35x, 4.26x, 3.28x times performance at Hit@20 compared with EOE, PTE, metapath2vec, node2vec, respectively as indicated in Figure 2(c)

引用论文
  • Amr Ahmed, Nino Shervashidze, Shravan Narayanamurthy, Vanja Josifovski, and Alexander J Smola. 2013. Distributed large-scale natural graph factorization. In WWW. 37–48.
    Google ScholarLocate open access versionFindings
  • Mukund Balasubramanian and Eric L Schwartz. 200The isomap algorithm and topological stability. Science 295, 5552 (2002), 7–7.
    Google ScholarLocate open access versionFindings
  • Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 201Translating embeddings for modeling multi-relational data. In NIPS. 2787–2795.
    Google ScholarFindings
  • Antoine Bordes, Jason Weston, Ronan Collobert, Yoshua Bengio, et al. 2011. Learning Structured Embeddings of Knowledge Bases.. In AAAI, Vol. 6. 6.
    Google ScholarLocate open access versionFindings
  • Shiyu Chang, Wei Han, Jiliang Tang, Guo-Jun Qi, Charu C Aggarwal, and Thomas S Huang. 201Heterogeneous network embedding via deep architectures. In KDD. 119–128.
    Google ScholarLocate open access versionFindings
  • Ting Chen, Yizhou Sun, Yue Shi, and Liangjie Hong. 2017. On Sampling Strategies for Neural Network-based Collaborative Filtering. In KDD. 767–776.
    Google ScholarLocate open access versionFindings
  • Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on top-n recommendation tasks. In RecSys. ACM, 39–46.
    Google ScholarLocate open access versionFindings
  • Peng Cui, Xiao Wang, Jian Pei, and Wenwu Zhu. 2017. A Survey on Network Embedding. CoRR abs/1711.08752 (2017).
    Findings
  • Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable representation learning for heterogeneous networks. In KDD. 135–144.
    Google ScholarLocate open access versionFindings
  • Tom Fawcett. 2006. An introduction to ROC analysis. Pattern recognition letters 27, 8 (2006), 861–874.
    Google ScholarLocate open access versionFindings
  • Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In KDD. 855–864.
    Google ScholarLocate open access versionFindings
  • Cheng-Kang Hsieh, Longqi Yang, Yin Cui, Tsung-Yi Lin, Serge Belongie, and Deborah Estrin. 2017. Collaborative metric learning. In WWW. 193–201.
    Google ScholarLocate open access versionFindings
  • Zhipeng Huang and Nikos Mamoulis. 2017. Heterogeneous Information Network Embedding for Meta Path based Proximity. arXiv preprint arXiv:1701.05291 (2017).
    Findings
  • Yann Jacob, Ludovic Denoyer, and Patrick Gallinari. 20Learning latent representations of nodes for classifying in heterogeneous social networks. In WSDM. 373–382.
    Google ScholarLocate open access versionFindings
  • Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. 20Learning entity and relation embeddings for knowledge graph completion.. In AAAI, Vol. 15. 2181–2187.
    Google ScholarLocate open access versionFindings
  • Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In NIPS. 3111–3119.
    Google ScholarFindings
  • Andrew Y Ng, Michael I Jordan, and Yair Weiss. 2002. On spectral clustering: Analysis and an algorithm. In NIPS. 849–856.
    Google ScholarLocate open access versionFindings
  • Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In KDD. 701–710.
    Google ScholarLocate open access versionFindings
  • Steffen Rendle and Christoph Freudenthaler. 2014. Improving pairwise learning for item recommendation from implicit feedback. In WSDM. 273–282.
    Google ScholarLocate open access versionFindings
  • Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars SchmidtThieme. 2009. BPR: Bayesian personalized ranking from implicit feedback. In uncertainty in artificial intelligence. 452–461.
    Google ScholarLocate open access versionFindings
  • Yizhou Sun, Rick Barber, Manish Gupta, Charu C Aggarwal, and Jiawei Han.
    Google ScholarFindings
  • 2011. Co-author relationship prediction in heterogeneous bibliographic networks. In ASONAM. IEEE, 121–128.
    Google ScholarLocate open access versionFindings
  • [22] Yizhou Sun, Jiawei Han, Charu C Aggarwal, and Nitesh V Chawla. 2012. When will it happen?: relationship prediction in heterogeneous information networks. In WSDM. 663–672.
    Google ScholarLocate open access versionFindings
  • [23] Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S Yu, and Tianyi Wu. 2011. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. VLDB 4, 11 (2011), 992–1003.
    Google ScholarLocate open access versionFindings
  • [24] Yizhou Sun, Brandon Norick, Jiawei Han, Xifeng Yan, Philip S Yu, and Xiao Yu. 2013. Pathselclus: Integrating meta-path selection with user-guided object clustering in heterogeneous information networks. TKDD 7, 3 (2013), 11.
    Google ScholarLocate open access versionFindings
  • [25] Jian Tang, Meng Qu, and Qiaozhu Mei. 2015. Pte: Predictive text embedding through large-scale heterogeneous text networks. In KDD. 1165–1174.
    Google ScholarFindings
  • [26] Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. In WWW. 1067–1077.
    Google ScholarFindings
  • [27] Lei Tang and Huan Liu. 2009. Scalable learning of collective behavior based on sparse social dimensions. In CIKM. 1107–1116.
    Google ScholarFindings
  • [28] Yi Tay, Anh Tuan Luu, and Siu Cheung Hui. 2017. Translational Recommender Networks. CoRR abs/1707.05176 (2017).
    Findings
  • [29] Myo Thida, How-Lung Eng, and Paolo Remagnino. 2013. Laplacian eigenmap with temporal constraints for local abnormality detection in crowded scenes. IEEE Transactions on Cybernetics 43, 6 (2013), 2147–2156.
    Google ScholarLocate open access versionFindings
  • [30] Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. 2014. Knowledge Graph Embedding by Translating on Hyperplanes.. In AAAI, Vol. 14. 1112–1119.
    Google ScholarLocate open access versionFindings
  • [31] Min Xie, Hongzhi Yin, Hao Wang, Fanjiang Xu, Weitong Chen, and Sen Wang. 2016. Learning graph-based poi embedding for location-based recommendation. In CIKM. 15–24.
    Google ScholarLocate open access versionFindings
  • [32] Linchuan Xu, Xiaokai Wei, Jiannong Cao, and Philip S Yu. 2017. Embedding of Embedding (EOE): Joint Embedding for Coupled Heterogeneous Networks. In WSDM. 741–749.
    Google ScholarLocate open access versionFindings
  • [33] Hongzhi Yin, Hongxu Chen, Xiaoshuai Sun, Hao Wang, Yang Wang, and Quoc Viet Hung Nguyen. 2017. SPTF: A Scalable Probabilistic Tensor Factorization Model for Semantic-Aware Behavior Prediction. In ICDM. 585–594.
    Google ScholarLocate open access versionFindings
  • [34] Hongzhi Yin, Bin Cui, Yizhou Sun, Zhiting Hu, and Ling Chen. 2014. LCARS: A spatial item recommender system. TOIS (2014), 11.
    Google ScholarLocate open access versionFindings
  • [35] Hongzhi Yin, Bin Cui, Xiaofang Zhou, Weiqing Wang, Zi Huang, and Shazia Sadiq. 2016. Joint modeling of user check-in behaviors for real-time point-ofinterest recommendation. TOIS (2016).
    Google ScholarLocate open access versionFindings
  • [36] Hongzhi Yin, Weiqing Wang, Hao Wang, Ling Chen, and Xiaofang Zhou. 2017. Spatial-Aware Hierarchical Collaborative Deep Learning for POI Recommendation. TKDE (2017), 2537–2551.
    Google ScholarLocate open access versionFindings
  • [37] Hongzhi Yin, Xiaofang Zhou, Bin Cui, Hao Wang, Kai Zheng, and Quoc Viet Hung Nguyen. 2016. Adapting to user interest drift for poi recommendation. ICDE 28, 10 (2016), 2566–2581.
    Google ScholarLocate open access versionFindings
  • [38] Hongzhi Yin, Lei Zou, Quoc Viet Hung Nguyen, Zi Huang, and Xiaofang Zhou. 2018. Joint eventpartner recommendation in event-based social networks. ICDE.
    Google ScholarLocate open access versionFindings
  • [39] Lubomir Zlatkov. 1978. Multidimensional Scaling (MDS). (1978).
    Google ScholarFindings
您的评分 :
0

 

标签
评论
小科