# An Attention-based Collaboration Framework for Multi-View Network Representation Learning

CIKM, pp. 1767-1776, 2017.

EI

Keywords:

Weibo:

Abstract:

Learning distributed node representations in networks has been attracting increasing attention recently due to its effectiveness in a variety of applications. Existing approaches usually study networks with a single type of proximity between nodes, which defines a single view of a network. However, in reality there usually exists multiple...More

Code:

Data:

Introduction

- Mining and analyzing large-scale information networks has a racted a lot of a ention recently due to their wide applications in the real world.
- There is a growing interest in representing networks into low-dimensional spaces (a.k.a, network embedding) [10, 20, 26], where each node is represented with a low-dimensional vector
- Such vector representations are able to preserve the proximities between nodes, which can be treated as features and bene t a variety of downstream applications, such as node classi cation [20, 26], link prediction [10] and node visualization [26].

Highlights

- Mining and analyzing large-scale information networks has a racted a lot of a ention recently due to their wide applications in the real world
- 3.1 Collaboration Framework e goal of the collaboration framework is to capture the node proximities encoded in individual views and integrate them to vote for the robust node representations. erefore, for each node i , we introduce a set of view-speci c representations {xki }kK=1 to preserve the structure information encoded in individual views
- We compare the robust node representations learned by MVE, node2vec-merge and MVE-NoCollab, and we report the performances on di erent node groups. e results are presented in Figure 3. e le groups contain nodes with larger degrees, in which the data are quite dense; while the right groups contain nodes with smaller degrees, and the data are very sparse
- We studied learning node representations for networks with multiple views
- We proposed an e ective framework to let di erent views collaborate with each other and vote for the robust node representations across di erent views
- We evaluated the performance of our proposed approach on ve real-world networks with multiple views

Results

- Experimental results on real-world networks show that the proposed approach outperforms existing state-of-the-art approaches for network representation learning with a single view and other competitive approaches with multiple views.
- The authors conduct extensive experiments on various real world multiview networks.
- Experimental results on both the multi-label node classi cation task and link prediction task show that the proposed approach outperforms state-of-the-art approaches for learning node representation with individual views and other competitive approaches with multiple views.
- Comparing the running time of MVE and MVE-NoA n, the authors observe that the weight learning process in MVE takes less than 15% of the total running time on both datasets, which shows the good e ciency of the a ention based approach for weight learning

Conclusion

- The authors studied learning node representations for networks with multiple views.
- The authors evaluated the performance of the proposed approach on ve real-world networks with multiple views.
- Experimental results on both the node classi cation task and link prediction task demonstrated the e ectiveness and e ciency of the proposed framework.
- One promising direction is learning node representations for heterogeneous information networks, i.e., networks with multiple types of nodes and edges
- In such networks, each meta-path [23] characterizes a type of proximity between the nodes, and various meta-paths yield networks with multiple views

Summary

## Introduction:

Mining and analyzing large-scale information networks has a racted a lot of a ention recently due to their wide applications in the real world.- There is a growing interest in representing networks into low-dimensional spaces (a.k.a, network embedding) [10, 20, 26], where each node is represented with a low-dimensional vector
- Such vector representations are able to preserve the proximities between nodes, which can be treated as features and bene t a variety of downstream applications, such as node classi cation [20, 26], link prediction [10] and node visualization [26].
## Results:

Experimental results on real-world networks show that the proposed approach outperforms existing state-of-the-art approaches for network representation learning with a single view and other competitive approaches with multiple views.- The authors conduct extensive experiments on various real world multiview networks.
- Experimental results on both the multi-label node classi cation task and link prediction task show that the proposed approach outperforms state-of-the-art approaches for learning node representation with individual views and other competitive approaches with multiple views.
- Comparing the running time of MVE and MVE-NoA n, the authors observe that the weight learning process in MVE takes less than 15% of the total running time on both datasets, which shows the good e ciency of the a ention based approach for weight learning
## Conclusion:

The authors studied learning node representations for networks with multiple views.- The authors evaluated the performance of the proposed approach on ve real-world networks with multiple views.
- Experimental results on both the node classi cation task and link prediction task demonstrated the e ectiveness and e ciency of the proposed framework.
- One promising direction is learning node representations for heterogeneous information networks, i.e., networks with multiple types of nodes and edges
- In such networks, each meta-path [23] characterizes a type of proximity between the nodes, and various meta-paths yield networks with multiple views

- Table1: Statistics of the datasets
- Table2: Table 2
- Table3: Table 3
- Table4: E ciency study. Our approach has close running time to LINE and node2vec. Learning weights of views takes less than 15% of the running time on both datasets
- Table5: Examples of nearest neighbors according to similarity calculated by view-speci c node representations and robust node representations on the DBLP dataset

Related work

- Our work is related to the existing scalable approaches for learning network representations including DeepWalk [20], LINE [26] and node2vec [10], which use di erent search strategies to exploit the network structures: depth- rst search, breadth- rst search, and a

Algorithm LINE node2vec MVE-NoA n

DBLP 91.45 s 144.77 s 105.05 s 120.38 s

Twitter 589.29 s 981.96 s 732.26 s 847.65 s combination of the two strategies. However, all these approaches focus on learning node representations for networks with a single view while we study networks with multiple views.

e other line of the related work is multi-view learning, which aims to exploit information from multiple views and has shown e ectiveness in various tasks such as classi cation [2, 12, 29], clustering [3, 12, 12, 33, 37], ranking [34], topic modeling [28] and activity recovery [36]. e work which is the most similar to ours is the multi-view clustering [3, 12, 33, 37] and multi-view matrix factorization [9, 16, 22] methods. For example, Kumar et al [12] proposed a spectral clustering framework to regularize the clustering hypotheses across di erent views. Liu et al [33] proposed a multi-view nonnegative matrix factorization model, which aims to minimize the distance between the coe cient matrix of each view and the consensus matrix. Our multi-view network representation approach shares similar intuition with these pieces of work, aiming to nd robust data representations across multiple views. However, a major di erence is that existing approaches assign equal weights to all views, while our approach adopts an a ention based method, which learns di erent voting weights of views for di erent nodes.

Funding

- Research was sponsored in part by the U.S Army Research Lab. under Cooperative Agreement No W911NF-09-2-0053 (NSCTA), National Science Foundation IIS-1320617 and IIS 16-18481, and grant 1U54GM114838 awarded by NIGMS through funds provided by the trans-NIH Big Data to Knowledge (BD2K) initiative (www.bd2k.nih.gov). e views and conclusions contained in this document are those of the author(s) and should not be interpreted as representing the o cial policies of the U.S Army Research Laboratory or the U.S. Government. e U.S Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation hereon
- Research was partially supported by the National Natural Science Foundation of China (NSFC Grant Nos. 61472006, 61772039 and 91646202)

Reference

- D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
- A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In Proceedings of the eleventh annual conference on Computational learning theory, pages 92–100. ACM, 1998.
- K. Chaudhuri, S. M. Kakade, K. Livescu, and K. Sridharan. Multi-view clustering via canonical correlation analysis. In Proceedings of the 26th annual international conference on machine learning, pages 129–136. ACM, 2009.
- J. Chorowski, D. Bahdanau, K. Cho, and Y. Bengio. End-to-end continuous speech recognition using a ention-based recurrent nn: First results. arXiv preprint arXiv:1412.1602, 2014.
- M. De Domenico, A. Lima, P. Mougel, and M. Musolesi. e anatomy of a scienti c rumor. Scienti c reports, 3, 2013.
- R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. Liblinear: A library for large linear classi cation. e Journal of Machine Learning Research, 9:1871–1874, 2008.
- T. Fawce. An introduction to roc analysis. Pa ern recognition le ers, 27(8):861– 874, 2006.
- A. Franceschini, D. Szklarczyk, S. Frankild, M. Kuhn, M. Simonovic, A. Roth, J. Lin, P. Minguez, P. Bork, C. Von Mering, et al. String v9. 1: protein-protein interaction networks, with increased coverage and integration. Nucleic acids research, 41(D1):D808–D815, 2013.
- D. Greene and P. Cunningham. A matrix factorization approach for integrating multiple data views. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 423–438.
- A. Grover and J. Leskovec. node2vec: Scalable feature learning for networks.
- P. Jaillet, G. Song, and G. Yu. Airline network design and hub location problems. Location science, 4(3):195–212, 1996.
- A. Kumar, P. Rai, and H. Daume. Co-regularized multi-view spectral clustering. In Advances in Neural Information Processing Systems, pages 1413–1421, 2011.
- O. Levy and Y. Goldberg. Neural word embedding as implicit matrix factorization. In Advances in neural information processing systems, pages 2177–2185, 2014.
- D. Liben-Nowell and J. Kleinberg. e link-prediction problem for social networks. Journal of the American society for information science and technology, 58(7):1019–1031, 2007.
- A. Liberzon, A. Subramanian, R. Pinchback, H. orvaldsdoir, P. Tamayo, and J. P. Mesirov. Molecular signatures database (msigdb) 3.0. Bioinformatics, 27(12):1739–1740, 2011.
- J. Liu, C. Wang, J. Gao, and J. Han. Multi-view clustering via joint nonnegative matrix factorization. In Proc. of SDM, volume 13, pages 252–260. SIAM, 2013.
- T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119, 2013.
- A. Mnih and Y. W. Teh. A fast and simple algorithm for training neural probabilistic language models. arXiv preprint arXiv:1206.6426, 2012.
- V. Mnih, N. Heess, A. Graves, et al. Recurrent models of visual a ention. In Advances in neural information processing systems, pages 2204–2212, 2014.
- B. Perozzi, R. Al-Rfou, and S. Skiena. Deepwalk: Online learning of social representations. arXiv preprint arXiv:1403.6652, 2014.
- D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning representations by back-propagating errors. Cognitive modeling, 5(3):1.
- A. P. Singh and G. J. Gordon. Relational learning via collective matrix factorization. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 650–658. ACM, 2008.
- Y. Sun, J. Han, X. Yan, P. S. Yu, and T. Wu. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. Proceedings of the VLDB Endowment, 4(11):992–1003, 2011.
- Y. Sun, Y. Yu, and J. Han. Ranking-based clustering of heterogeneous information networks with star network schema. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 797–806. ACM, 2009.
- J. Tang, M., and Q. Mei. Pte: Predictive text embedding through largescale heterogeneous text networks. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1165– 1174. ACM, 2015.
- J. Tang, M., M. Wang, M. Zhang, J. Yan, and Q. Mei. Line: Large-scale information network embedding. In Proceedings of the 24th International Conference on World Wide Web, pages 1067–1077. ACM, 2015.
- J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su. Arnetminer: extraction and mining of academic social networks. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 990–998. ACM, 2008.
- J. Tang, M. Zhang, and Q. Mei. One theme in all views: modeling consensus topics in multiple contexts. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 5–13. ACM, 2013.
- W. Wang and Z.-H. Zhou. A new analysis of co-training. In Proceedings of the 27th international conference on machine learning (ICML-10), pages 1135–1142, 2010.
- X. Wang, L. Tang, H. Liu, and L. Wang. Learning with multi-resolution overlapping communities. Knowledge and Information Systems (KAIS), 2012.
- S. Wasserman and K. Faust. Social network analysis: Methods and applications, volume 8. Cambridge university press, 1994.
- S. J. Wright. Coordinate descent algorithms. Mathematical Programming, 151(1):3–34, 2015.
- T. Xia, D. Tao, T. Mei, and Y. Zhang. Multiview spectral embedding. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, 40(6):1438–1446, 2010.
- J. Yu, Y. Rui, and B. Chen. Exploiting click constraints and multi-view features for image re-ranking. IEEE Transactions on Multimedia, 16(1):159–168, 2014.
- R. Zafarani and H. Liu. Social computing data repository at ASU, 2009.
- C. Zhang, K. Zhang, Q. Yuan, H. Peng, Y. Zheng, T. Hanra y, S. Wang, and J. Han. Regions, periods, activities: Uncovering urban dynamics via cross-modal representation learning. In Proceedings of the 26th International Conference on World Wide Web, pages 361–370. International World Wide Web Conferences Steering Commi ee, 2017.
- D. Zhou and C. J. Burges. Spectral clustering and transductive learning with multiple views. In Proceedings of the 24th international conference on Machine learning, pages 1159–1166. ACM, 2007.

Tags

Comments