# Faster Graph Embeddings via Coarsening

ICML 2020, 2020.

Keywords:

Weibo:

Abstract:

Graph embeddings are a ubiquitous tool for machine learning tasks, such as node classification and link prediction, on graph-structured data. However, computing the embeddings for large-scale graphs is prohibitively inefficient even if we are interested only in a small subset of relevant vertices. To address this, we present an efficient ...More

Code:

Data:

Introduction

- Over the past several years, network embeddings have been demonstrated to be a remarkably powerful tool for learning unsupervised representations for nodes in a network (Perozzi et al, 2014; Tang et al, 2015; Grover and Leskovec, 2016).
- The objective is to learn a low-dimensional vector for each node that captures the structure of its neighborhood
- These embeddings have proved to be very e ective for downstream machine learning tasks in networks such as node classi cation and link prediction (Tang et al, 2015; Hamilton et al, 2017).
- The matrix factorization-based approaches typically require computing the singular value decomposition (SVD) of an

Highlights

- Over the past several years, network embeddings have been demonstrated to be a remarkably powerful tool for learning unsupervised representations for nodes in a network (Perozzi et al, 2014; Tang et al, 2015; Grover and Leskovec, 2016)
- When eliminating vertices of a graph using the Schur complement, the resulting graph perfectly preserves random walk transition probabilities through the eliminated vertex set with respect to the original graph
- Our work introduces Schur complements in the context of graph embeddings, and gives a simple random contraction rule that leads to a decrease in the edge count in the contracted graph, preserves Schur complements in expectation in each step, and performs well in practice
- We introduce two vertex sparsi cation algorithms based on Schur complements to be used as a preprocessing routine when computing graph embeddings of large-scale networks
- We prove that the random contraction based-scheme produces a graph that is the same in expectation as the one given by Gaussian elimination, which in turn yields the matrix factorization that random walk-based graph embeddings such as DeepWalk, Network Matrix Factorization and NetSMF aim to approximate
- We demonstrate on commonly-used benchmarks for graph embedding-based multi-label vertex classi cation tasks that both of these algorithms empirically improve the prediction accuracy compared to using graph embeddings of the original and unsparsi ed networks, while running in less time and using substantially less memory

Methods

- The authors investigate how the vertex sparsi ers S C a ect the predictive performance of graph embeddings for two di erent learning tasks.
- Flickr (Tang and Liu, 2009) is a network of user contacts on the image-sharing website Flickr, and its labels represent groups interested in di erent types of photography.
- YouTube (Yang and Leskovec, 2015) is a social network on users of the popular video-sharing website, and its labels are user-de ned groups with mutual interests in video genres.
- The authors only consider the largest connected component of the YouTube network

Results

- The authors' rst observation is that quality of the embedding for classi cation always improves by running the S C sparsi er
- To explain this phenomenon, the authors note that LINE computes an embedding using length = 1 random walks.
- For the YouTube experiment, the authors observe that R substantially outperforms S C and the baseline LINE embedding
- The authors attribute this behavior to the fact that contractions preserve edge sparsity unlike Schur complements.
- The authors' experiments highlight the importance of choosing the right binary operator for a given node embedding algorithm

Conclusion

- The authors introduce two vertex sparsi cation algorithms based on Schur complements to be used as a preprocessing routine when computing graph embeddings of large-scale networks.
- Both of these algorithms repeatedly choose a vertex to remove and add new edges between its neighbors.
- The authors demonstrate on commonly-used benchmarks for graph embedding-based multi-label vertex classi cation tasks that both of these algorithms empirically improve the prediction accuracy compared to using graph embeddings of the original and unsparsi ed networks, while running in less time and using substantially less memory

Summary

## Introduction:

Over the past several years, network embeddings have been demonstrated to be a remarkably powerful tool for learning unsupervised representations for nodes in a network (Perozzi et al, 2014; Tang et al, 2015; Grover and Leskovec, 2016).- The objective is to learn a low-dimensional vector for each node that captures the structure of its neighborhood
- These embeddings have proved to be very e ective for downstream machine learning tasks in networks such as node classi cation and link prediction (Tang et al, 2015; Hamilton et al, 2017).
- The matrix factorization-based approaches typically require computing the singular value decomposition (SVD) of an
## Methods:

The authors investigate how the vertex sparsi ers S C a ect the predictive performance of graph embeddings for two di erent learning tasks.- Flickr (Tang and Liu, 2009) is a network of user contacts on the image-sharing website Flickr, and its labels represent groups interested in di erent types of photography.
- YouTube (Yang and Leskovec, 2015) is a social network on users of the popular video-sharing website, and its labels are user-de ned groups with mutual interests in video genres.
- The authors only consider the largest connected component of the YouTube network
## Results:

The authors' rst observation is that quality of the embedding for classi cation always improves by running the S C sparsi er- To explain this phenomenon, the authors note that LINE computes an embedding using length = 1 random walks.
- For the YouTube experiment, the authors observe that R substantially outperforms S C and the baseline LINE embedding
- The authors attribute this behavior to the fact that contractions preserve edge sparsity unlike Schur complements.
- The authors' experiments highlight the importance of choosing the right binary operator for a given node embedding algorithm
## Conclusion:

The authors introduce two vertex sparsi cation algorithms based on Schur complements to be used as a preprocessing routine when computing graph embeddings of large-scale networks.- Both of these algorithms repeatedly choose a vertex to remove and add new edges between its neighbors.
- The authors demonstrate on commonly-used benchmarks for graph embedding-based multi-label vertex classi cation tasks that both of these algorithms empirically improve the prediction accuracy compared to using graph embeddings of the original and unsparsi ed networks, while running in less time and using substantially less memory

- Table1: Statistics of the networks in our vertex classi cation experiments
- Table2: Running times of the coarsening and graph embedding stages in the vertex classi cation experiment (seconds)
- Table3: Statistics of the networks in our link prediction experiments
- Table4: Area under the curve (AUC) scores for di erent operators, coarsening, and embedding algorithms for the link prediction task

Related work

- The study of vertex sparsi ers is closely related to graph coarsening (Chevalier and Safro, 2009; Loukas and Vandergheynst, 2018) and the study of core-peripheral networks (Benson and Kleinberg, 2019; Jia and Benson, 2019), where analytics are focused only on a core of vertices. In our setting, the terminal vertices play roles analogous to the core vertices. In this paper, we focus on unsupervised approaches for learning graph embeddings, which are then used as input for downstream classi cation tasks. There has been considerable work on semi-supervised approaches to learning on graphs (Yang et al, 2016; Kipf and Welling, 2017; Veličković et al, 2018; Thekumparampil et al, 2018), including some that exploit connections with Schur complements (Vattani et al, 2011; Wagner et al, 2018; Viswanathan et al, 2019). Our techniques have direct connections with multilevel and multiscale algorithms, which aim to use a smaller version of a problem (typically on matrices or graphs) to generate answers that can be extended to the full problem (Chen et al, 2018; Liang et al, 2018; Abu-El-Haija et al, 2019). There exist wellknown connection between Schur complements, random contractions, and ner grids in the multigrid literature (Briggs et al, 2000). These connections have been utilized for e ciently solving Laplacian linear systems (Kyng and Sachdeva, 2016; Kyng et al, 2016), via provable spectral approximations to the Schur complement. However, approximations constructed using these algorithms have many more edges (by at least a factor of 1/ 2) than the original graph, limiting the practical applicability of these works. On the other hand, our work introduces Schur complements in the context of graph embeddings, and gives a simple random contraction rule that leads to a decrease in the edge count in the contracted graph, preserves Schur complements in expectation in each step, and performs well in practice. Graph compression techniques aimed at reducing the number of vertices have been studied for other graph primitives, including cuts/ ows (Moitra, 2009; Englert et al, 2014) and shortest path distances (Thorup and Zwick, 2005). However, the main objective of these works is to construct sparsi ers with theoretical guarantees and to the best of our knowledge, there are no works that consider their practical applicability.

Funding

- MF did part of this work while supported by an NSF Graduate Research Fellowship under grant DGE1650044 at the Georgia Institute of Technology
- SS and GG are partly supported by an NSERC Discovery grant awarded to SS by NSERC (Natural Sciences and Engineering Research Council of Canada)
- RP did part of this work while at Microsoft Research Redmond, and is partially supported by the NSF under grants CCF-1637566 and CCF-1846218

Reference

- Abu-El-Haija, S., Perozzi, B., Kapoor, A., Alipourfard, N., Lerman, K., Harutyunyan, H., Steeg, G. V., and Galstyan, A. (2019). MixHop: Higher-order graph convolutional architectures via sparsi ed neighborhood mixing. In Proceedings of the 36th International Conference on Machine Learning, pages 21–29. PMLR.
- Batson, J., Spielman, D. A., Srivastava, N., and Teng, S.-H. (2013). Spectral sparsi cation of graphs: Theory and algorithms. Communications of the ACM, 56(8):87–94.
- Benson, A. and Kleinberg, J. (2019). Link prediction in networks with core-fringe data. In Proceedings of the 28th International Conference on World Wide Web, pages 94–104. ACM.
- Borgatti, S. P. and Everett, M. G. (2000). Models of core/periphery structures. Social networks, 21(4):375–395.
- Briggs, W. L., Henson, V. E., and McCormick, S. F. (2000). A Multigrid Tutorial. SIAM. Bringmann, K. and Panagiotou, K. (2017). E cient sampling methods for discrete distributions. Algorithmica, 79(2):484–508.
- Bruna, J., Zaremba, W., Szlam, A., and LeCun, Y. (2014). Spectral networks and locally connected networks on graphs. In Proceedings of the 2nd International Conference on Learning Representations.
- Cao, S., Lu, W., and Xu, Q. (2015). GraRep: Learning graph representations with global structural information. In Proceedings of the 24th ACM International Conference on Information and Knowledge Management, pages 891–900.
- Chen, H., Perozzi, B., Hu, Y., and Skiena, S. (2018). HARP: Hierarchical representation learning for networks. In Proceedings of the Thirty-Second AAAI Conference on Arti cial Intelligence, pages 2127–2134.
- Cheng, D., Cheng, Y., Liu, Y., Peng, R., and Teng, S.-H. (2015). E cient sampling for Gaussian graphical models via spectral sparsi cation. In Proceedings of The 28th Conference on Learning Theory (COLT), pages 364–390. PMLR.
- Chevalier, C. and Safro, I. (2009). Comparison of coarsening schemes for multilevel graph partitioning. In International Conference on Learning and Intelligent Optimization, pages 191–205. Springer.
- Cummings, R., Fahrbach, M., and Fatehpuria, A. (2019). A fast minimum degree algorithm and matching lower bound. arXiv preprint arXiv:1907.12119.
- Englert, M., Gupta, A., Krauthgamer, R., Räcke, H., Talgam-Cohen, I., and Talwar, K. (2014). Vertex sparsi ers: New results from old techniques. SIAM J. Comput., 43(4):1239–1262.
- Fahrbach, M., Miller, G. L., Peng, R., Sawlani, S., Wang, J., and Xu, S. C. (2018). Graph sketching against adaptive adversaries applied to the minimum degree algorithm. In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), pages 101–112. IEEE.
- George, A. and Liu, J. W. H. (1989). The evolution of the minimum degree ordering algorithm. SIAM Review, 31(1):1–19.
- Grover, A. and Leskovec, J. (2016). Node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, page 855–864.
- Hamilton, W. L., Ying, Z., and Leskovec, J. (2017). Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems, pages 1024–1034.
- Jia, J. and Benson, A. R. (2019). Random spatial network models for core-periphery structure. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, pages 366–374. ACM.
- Karypis, G. and Kumar, V. (1998). A software package for partitioning unstructured graphs, partitioning meshes, and computing ll-reducing orderings of sparse matrices.
- Kipf, T. N. and Welling, M. (2017). Semi-supervised classi cation with graph convolutional networks. In Proceedings of the 5th International Conference on Learning Representations.
- Kyng, R., Lee, Y. T., Peng, R., Sachdeva, S., and Spielman, D. A. (2016). Sparsi ed cholesky and multigrid solvers for connection laplacians. In Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 842–850.
- Kyng, R. and Sachdeva, S. (2016). Approximate gaussian elimination for laplacians - fast, sparse, and simple. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 573–582.
- Leskovec, J. and Krevl, A. (2014). SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data.
- Liang, J., Gurukar, S., and Parthasarathy, S. (2018). MILE: A multi-level framework for scalable graph embedding. arXiv preprint arXiv:1802.09612.
- Liu, Y., Dighe, A., Safavi, T., and Koutra, D. (2016). A graph summarization: A survey. CoRR, abs/1612.04883.
- Loukas, A. and Vandergheynst, P. (2018). Spectrally approximating large graphs with smaller graphs. In Proceedings of the 35th International Conference on Machine Learning, pages 3237–3246. PMLR. Moitra, A. (2009). Approximation algorithms for multicommodity-type problems with guarantees independent of the graph size. In 50th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 3–12.
- Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830.
- Perozzi, B., Al-Rfou, R., and Skiena, S. (2014). Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 701–710. ACM.
- Qiu, J., Dong, Y., Ma, H., Li, J., Wang, C., Wang, K., and Tang, J. (2019). NetSMF: Large-scale network embedding as sparse matrix factorization. In Proceedings of the 28th International Conference on World Wide Web, pages 1509–1520. ACM.
- Qiu, J., Dong, Y., Ma, H., Li, J., Wang, K., and Tang, J. (2018). Network embedding as matrix factorization: Unifying deepwalk, line, pte, and node2vec. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pages 459–467.
- Stark, C., Breitkreutz, B.-J., Reguly, T., Boucher, L., Breitkreutz, A., and Tyers, M. (2006). BioGRID: A general repository for interaction datasets. Nucleic Acids Research, 34:D535–D539.
- Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., and Mei, Q. (2015). LINE: Large-scale information network embedding. In Proceedings of the 24th International Conference on World Wide Web, pages 1067–1077.
- Tang, L. and Liu, H. (2009). Relational learning via latent social dimensions. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 817–826. ACM.
- Tang, L. and Liu, H. (2011). Leveraging social media networks for classi cation. Data Mining and Knowledge Discovery, 23(3):447–478.
- Thekumparampil, K. K., Wang, C., Oh, S., and Li, L.-J. (2018). Attention-based graph neural network for semi-supervised learning. arXiv preprint arXiv:1803.03735.
- Thorup, M. and Zwick, U. (2005). Approximate distance oracles. J. ACM, 52(1):1–24.
- Vattani, A., Chakrabarti, D., and Gurevich, M. (2011). Preserving personalized pagerank in subgraphs. In Proceedings of the 28th International Conference on Machine Learning, page 793–800.
- Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., and Bengio, Y. (2018). Graph attention networks. In Proceedings of the 6th International Conference on Learning Representations.
- Viswanathan, K., Sachdeva, S., Tomkins, A., and Ravi, S. (2019). Improved semi-supervised learning with multiple graphs. In The 22nd International Conference on Arti cial Intelligence and Statistics, pages 3032–3041.
- Wagner, T., Guha, S., Kasiviswanathan, S., and Mishra, N. (2018). Semi-supervised learning on data streams via temporal label propagation. In Proceedings of the 35th International Conference on Machine Learning, pages 5095–5104.
- Yang, J. and Leskovec, J. (2015). De ning and evaluating network communities based on ground-truth. Knowledge and Information Systems, 42(1):181–213.
- Yang, Z., W. Cohen, W., and Salakhutdinov, R. (2016). Revisiting semi-supervised learning with graph embeddings. In Proceedings of the 33rd International Conference on Machine Learning, pages 40–48.

Tags

Comments