We studied feature learning in networks as a searchbased optimization problem
node2vec: Scalable Feature Learning for Networks
KDD, (2016): 855-864
下载 PDF 全文
Prediction tasks over nodes and edges in networks require careful effort in engineering features used by learning algorithms. Recent research in the broader field of representation learning has led to significant progress in automating prediction by learning the features themselves. However, present feature learning approaches are not exp...更多
下载 PDF 全文
- Many important tasks in network analysis involve predictions over nodes and edges.
- In a typical node classification task, the authors are interested in predicting the most probable labels of nodes in a network .
- In a social network, the authors might be interested in predicting interests of users, or in a protein-protein interaction network the authors might be interested in predicting functional labels of proteins [25, 37].
- In link prediction, the authors wish to.
- Many important tasks in network analysis involve predictions over nodes and edges
- We extend node2vec and other feature learning methods based on neighborhood preserving objectives, from nodes to pairs of nodes for edge-based prediction tasks
- Since our random walks are naturally based on the connectivity structure between nodes in the underlying network, we extend them to pairs of nodes using a bootstrapping approach over the feature representations of the individual nodes
- Since none of feature learning algorithms have been previously used for link prediction, we evaluate node2vec against some popular heuristic scores that achieve good performance in link prediction
- We studied feature learning in networks as a searchbased optimization problem
- Depth-first Sampling can freely explore network neighborhoods which is important in discovering homophilous communities at the cost of high variance
- The objective in Eq 2 is independent of any downstream task and the flexibility in exploration offered by node2vec lends the learned feature representations to a wide variety of network analysis settings discussed below. 4.1 Case Study
Les Misérables network
In Section 3.1 the authors observed that BFS and DFS strategies represent extreme ends on the spectrum of embedding nodes based on the principles of homophily and structural equivalence.
- The objective in Eq 2 is independent of any downstream task and the flexibility in exploration offered by node2vec lends the learned feature representations to a wide variety of network analysis settings discussed below.
- The network has 77 nodes and 254 edges.
- The authors set d = 16 and run node2vec to learn feature representation for every node in the network.
- The authors visualize the original network in two dimensions with nodes assigned colors based on their clusters
- The node feature representations are input to a one-vs-rest logistic regression classifier with L2 regularization.
- A general observation the authors can draw from the results is that the learned feature representations for node pairs significantly outperform the heuristic benchmark scores with node2vec achieving the best AUC improvement on 12.6% on the arXiv dataset over the best performing baseline (Adamic-Adar ).
- Amongst the feature learning algorithms, node2vec outperforms both DeepWalk and LINE in all networks with gain up to 3.8% and 6.5% respectively in the AUC scores for the best possible choices of the binary operator for each algorithm.
- The Hadamard operator when used with node2vec is highly stable and gives the best performance on average across all networks
- The authors studied feature learning in networks as a searchbased optimization problem.
- This perspective gives them multiple advantages.
- The authors observed that BFS can explore only limited neighborhoods.
- This makes BFS suitable for characterizing structural equivalences in network that rely on the immediate local structure of nodes.
- DFS can freely explore network neighborhoods which is important in discovering homophilous communities at the cost of high variance
- Table1: Choice of binary operators ◦ for learning edge features. The definitions correspond to the ith component of g(u, v)
- Table2: Macro-F1 scores for multilabel classification on BlogCatalog, PPI (Homo sapiens) and Wikipedia word cooccurrence networks with 50% of the nodes labeled for training
- Table3: Link prediction heuristic scores for node pair (u, v) with immediate neighbor sets N (u) and N (v) respectively
- Table4: Area Under Curve (AUC) scores for link prediction. Comparison with popular baselines and embedding based methods bootstapped using binary operators: (a) Average, (b) Hadamard, (c) Weighted-L1, and (d) Weighted-L2 (See Table 1 for definitions)
- Feature engineering has been extensively studied by the machine learning community under various headings. In networks, the conventional paradigm for generating features for nodes is based on feature extraction techniques which typically involve some seed hand-crafted features based on network properties [8, 11]. In contrast, our goal is to automate the whole process by casting feature extraction as a representation learning problem in which case we do not require any hand-engineered features.
Unsupervised feature learning approaches typically exploit the spectral properties of various matrix representations of graphs, especially the Laplacian and the adjacency matrices. Under this linear algebra perspective, these methods can be viewed as dimensionality reduction techniques. Several linear (e.g., PCA) and non-linear (e.g., IsoMap) dimensionality reduction techniques have been proposed [3, 27, 30, 35]. These methods suffer from both computational and statistical performance drawbacks. In terms of computational efficiency, eigendecomposition of a data matrix is expensive unless the solution quality is significantly compromised with approximations, and hence, these methods are hard to scale to large networks. Secondly, these methods optimize for objectives that are not robust to the diverse patterns observed in networks (such as homophily and structural equivalence) and make assumptions about the relationship between the underlying network structure and the prediction task. For instance, spectral clustering makes a strong homophily assumption that graph cuts will be useful for classification . Such assumptions are reasonable in many scenarios, but unsatisfactory in effectively generalizing across diverse networks.
- This research has been supported in part by NSF CNS-1010921, IIS-1149837, NIH BD2K, ARO MURI, DARPA XDATA, DARPA SIMPLEX, Stanford Data Science Initiative, Boeing, Lightspeed, SAP, and Volkswagen
- L. A. Adamic and E. Adar. Friends and neighbors on the web. Social networks, 25(3):211–230, 2003.
- L. Backstrom and J. Leskovec. Supervised random walks: predicting and recommending links in social networks. In WSDM, 2011.
- M. Belkin and P. Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. In NIPS, 2001.
- Y. Bengio, A. Courville, and P. Vincent. Representation learning: A review and new perspectives. IEEE TPAMI, 35(8):1798–1828, 2013.
- B.-J. Breitkreutz, C. Stark, T. Reguly, L. Boucher, A. Breitkreutz, M. Livstone, R. Oughtred, D. H. Lackner, J. Bähler, V. Wood, et al. The BioGRID interaction database. Nucleic acids research, 36:D637–D640, 2008.
- S. Cao, W. Lu, and Q. Xu. GraRep: Learning Graph Representations with global structural information. In CIKM, 2015.
- S. Fortunato. Community detection in graphs. Physics Reports, 486(3-5):75 – 174, 2010.
- B. Gallagher and T. Eliassi-Rad. Leveraging label-independent features for classification in sparsely labeled networks: An empirical study. In Lecture Notes in Computer Science: Advances in Social Network Mining and Analysis. Springer, 2009.
- Z. S. Harris. Word. Distributional Structure, 10(23):146–162, 1954.
- K. Henderson, B. Gallagher, T. Eliassi-Rad, H. Tong, S. Basu, L. Akoglu, D. Koutra, C. Faloutsos, and L. Li. RolX: structural role extraction & mining in large graphs. In KDD, 2012.
- K. Henderson, B. Gallagher, L. Li, L. Akoglu, T. Eliassi-Rad, H. Tong, and C. Faloutsos. It’s who you know: graph mining using recursive structural features. In KDD, 2011.
- P. D. Hoff, A. E. Raftery, and M. S. Handcock. Latent space approaches to social network analysis. J. of the American Statistical Association, 2002.
- D. E. Knuth. The Stanford GraphBase: a platform for combinatorial computing, volume 37. Addison-Wesley Reading, 1993.
- J. Leskovec and A. Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014.
- K. Li, J. Gao, S. Guo, N. Du, X. Li, and A. Zhang. LRBM: A restricted boltzmann machine based approach for representation learning on linked data. In ICDM, 2014.
- X. Li, N. Du, H. Li, K. Li, J. Gao, and A. Zhang. A deep learning approach to link prediction in dynamic networks. In ICDM, 2014.
- Y. Li, D. Tarlow, M. Brockschmidt, and R. Zemel. Gated graph sequence neural networks. In ICLR, 2016.
- D. Liben-Nowell and J. Kleinberg. The link-prediction problem for social networks. J. of the American society for information science and technology, 58(7):1019–1031, 2007.
- A. Liberzon, A. Subramanian, R. Pinchback, H. Thorvaldsdóttir, P. Tamayo, and J. P. Mesirov. Molecular signatures database (MSigDB) 3.0. Bioinformatics, 27(12):1739–1740, 2011.
- M. Mahoney. Large text compression benchmark. www.mattmahoney.net/dc/textdata, 2011.
- T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. In ICLR, 2013.
- T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, 2013.
- J. Pennington, R. Socher, and C. D. Manning. GloVe: Global vectors for word representation. In EMNLP, 2014.
- B. Perozzi, R. Al-Rfou, and S. Skiena. DeepWalk: Online learning of social representations. In KDD, 2014.
- P. Radivojac, W. T. Clark, T. R. Oron, A. M. Schnoes, T. Wittkop, A. Sokolov, K. Graim, C. Funk, Verspoor, et al. A large-scale evaluation of computational protein function prediction. Nature methods, 10(3):221–227, 2013.
- B. Recht, C. Re, S. Wright, and F. Niu. Hogwild!: A lock-free approach to parallelizing stochastic gradient descent. In NIPS, 2011.
- S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323–2326, 2000.
- J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei. LINE: Large-scale Information Network Embedding. In WWW, 2015.
- L. Tang and H. Liu. Leveraging social media networks for classification. Data Mining and Knowledge Discovery, 23(3):447–478, 2011.
- J. B. Tenenbaum, V. De Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319–2323, 2000.
- F. Tian, B. Gao, Q. Cui, E. Chen, and T.-Y. Liu. Learning deep representations for graph clustering. In AAAI, 2014.
- K. Toutanova, D. Klein, C. D. Manning, and Y. Singer. Feature-rich part-of-speech tagging with a cyclic dependency network. In NAACL, 2003.
- G. Tsoumakas and I. Katakis. Multi-label classification: An overview. Dept. of Informatics, Aristotle University of Thessaloniki, Greece, 2006.
- A. Vazquez, A. Flammini, A. Maritan, and A. Vespignani. Global protein function prediction from protein-protein interaction networks. Nature biotechnology, 21(6):697–700, 2003.
- S. Yan, D. Xu, B. Zhang, H.-J. Zhang, Q. Yang, and S. Lin. Graph embedding and extensions: a general framework for dimensionality reduction. IEEE TPAMI, 29(1):40–51, 2007.
- J. Yang and J. Leskovec. Overlapping communities explain core-periphery organization of networks. Proceedings of the IEEE, 102(12):1892–1902, 2014.
- S.-H. Yang, B. Long, A. Smola, N. Sadagopan, Z. Zheng, and H. Zha. Like like alike: joint friendship and interest propagation in social networks. In WWW, 2011.
- R. Zafarani and H. Liu. Social computing data repository at ASU, 2009.
- S. Zhai and Z. Zhang. Dropout training of matrix factorization and autoencoder for link prediction in sparse graphs. In SDM, 2015.