Faster Graph Embeddings via Coarsening

Matthew Fahrbach
Matthew Fahrbach
Gramoz Goranci
Gramoz Goranci
Sushant Sachdeva
Sushant Sachdeva
Chi Wang
Chi Wang

ICML 2020, 2020.

Cited by: 0|Bibtex|Views13|Links
Keywords:
large scalenetwork embeddinggaussian eliminationProtein-Protein Interactionsparsi cationMore(11+)
Weibo:
We present an efficient graph coarsening approach, based on Schur complements, for computing the embedding of the relevant vertices

Abstract:

Graph embeddings are a ubiquitous tool for machine learning tasks, such as node classification and link prediction, on graph-structured data. However, computing the embeddings for large-scale graphs is prohibitively inefficient even if we are interested only in a small subset of relevant vertices. To address this, we present an efficient ...More

Code:

Data:

0
Introduction
  • Over the past several years, network embeddings have been demonstrated to be a remarkably powerful tool for learning unsupervised representations for nodes in a network (Perozzi et al, 2014; Tang et al, 2015; Grover and Leskovec, 2016).
  • The objective is to learn a low-dimensional vector for each node that captures the structure of its neighborhood
  • These embeddings have proved to be very e ective for downstream machine learning tasks in networks such as node classi cation and link prediction (Tang et al, 2015; Hamilton et al, 2017).
  • The matrix factorization-based approaches typically require computing the singular value decomposition (SVD) of an
Highlights
  • Over the past several years, network embeddings have been demonstrated to be a remarkably powerful tool for learning unsupervised representations for nodes in a network (Perozzi et al, 2014; Tang et al, 2015; Grover and Leskovec, 2016)
  • When eliminating vertices of a graph using the Schur complement, the resulting graph perfectly preserves random walk transition probabilities through the eliminated vertex set with respect to the original graph
  • Our work introduces Schur complements in the context of graph embeddings, and gives a simple random contraction rule that leads to a decrease in the edge count in the contracted graph, preserves Schur complements in expectation in each step, and performs well in practice
  • We introduce two vertex sparsi cation algorithms based on Schur complements to be used as a preprocessing routine when computing graph embeddings of large-scale networks
  • We prove that the random contraction based-scheme produces a graph that is the same in expectation as the one given by Gaussian elimination, which in turn yields the matrix factorization that random walk-based graph embeddings such as DeepWalk, Network Matrix Factorization and NetSMF aim to approximate
  • We demonstrate on commonly-used benchmarks for graph embedding-based multi-label vertex classi cation tasks that both of these algorithms empirically improve the prediction accuracy compared to using graph embeddings of the original and unsparsi ed networks, while running in less time and using substantially less memory
Methods
  • The authors investigate how the vertex sparsi ers S C a ect the predictive performance of graph embeddings for two di erent learning tasks.
  • Flickr (Tang and Liu, 2009) is a network of user contacts on the image-sharing website Flickr, and its labels represent groups interested in di erent types of photography.
  • YouTube (Yang and Leskovec, 2015) is a social network on users of the popular video-sharing website, and its labels are user-de ned groups with mutual interests in video genres.
  • The authors only consider the largest connected component of the YouTube network
Results
  • The authors' rst observation is that quality of the embedding for classi cation always improves by running the S C sparsi er
  • To explain this phenomenon, the authors note that LINE computes an embedding using length = 1 random walks.
  • For the YouTube experiment, the authors observe that R substantially outperforms S C and the baseline LINE embedding
  • The authors attribute this behavior to the fact that contractions preserve edge sparsity unlike Schur complements.
  • The authors' experiments highlight the importance of choosing the right binary operator for a given node embedding algorithm
Conclusion
  • The authors introduce two vertex sparsi cation algorithms based on Schur complements to be used as a preprocessing routine when computing graph embeddings of large-scale networks.
  • Both of these algorithms repeatedly choose a vertex to remove and add new edges between its neighbors.
  • The authors demonstrate on commonly-used benchmarks for graph embedding-based multi-label vertex classi cation tasks that both of these algorithms empirically improve the prediction accuracy compared to using graph embeddings of the original and unsparsi ed networks, while running in less time and using substantially less memory
Summary
  • Introduction:

    Over the past several years, network embeddings have been demonstrated to be a remarkably powerful tool for learning unsupervised representations for nodes in a network (Perozzi et al, 2014; Tang et al, 2015; Grover and Leskovec, 2016).
  • The objective is to learn a low-dimensional vector for each node that captures the structure of its neighborhood
  • These embeddings have proved to be very e ective for downstream machine learning tasks in networks such as node classi cation and link prediction (Tang et al, 2015; Hamilton et al, 2017).
  • The matrix factorization-based approaches typically require computing the singular value decomposition (SVD) of an
  • Methods:

    The authors investigate how the vertex sparsi ers S C a ect the predictive performance of graph embeddings for two di erent learning tasks.
  • Flickr (Tang and Liu, 2009) is a network of user contacts on the image-sharing website Flickr, and its labels represent groups interested in di erent types of photography.
  • YouTube (Yang and Leskovec, 2015) is a social network on users of the popular video-sharing website, and its labels are user-de ned groups with mutual interests in video genres.
  • The authors only consider the largest connected component of the YouTube network
  • Results:

    The authors' rst observation is that quality of the embedding for classi cation always improves by running the S C sparsi er
  • To explain this phenomenon, the authors note that LINE computes an embedding using length = 1 random walks.
  • For the YouTube experiment, the authors observe that R substantially outperforms S C and the baseline LINE embedding
  • The authors attribute this behavior to the fact that contractions preserve edge sparsity unlike Schur complements.
  • The authors' experiments highlight the importance of choosing the right binary operator for a given node embedding algorithm
  • Conclusion:

    The authors introduce two vertex sparsi cation algorithms based on Schur complements to be used as a preprocessing routine when computing graph embeddings of large-scale networks.
  • Both of these algorithms repeatedly choose a vertex to remove and add new edges between its neighbors.
  • The authors demonstrate on commonly-used benchmarks for graph embedding-based multi-label vertex classi cation tasks that both of these algorithms empirically improve the prediction accuracy compared to using graph embeddings of the original and unsparsi ed networks, while running in less time and using substantially less memory
Tables
  • Table1: Statistics of the networks in our vertex classi cation experiments
  • Table2: Running times of the coarsening and graph embedding stages in the vertex classi cation experiment (seconds)
  • Table3: Statistics of the networks in our link prediction experiments
  • Table4: Area under the curve (AUC) scores for di erent operators, coarsening, and embedding algorithms for the link prediction task
Download tables as Excel
Related work
  • The study of vertex sparsi ers is closely related to graph coarsening (Chevalier and Safro, 2009; Loukas and Vandergheynst, 2018) and the study of core-peripheral networks (Benson and Kleinberg, 2019; Jia and Benson, 2019), where analytics are focused only on a core of vertices. In our setting, the terminal vertices play roles analogous to the core vertices. In this paper, we focus on unsupervised approaches for learning graph embeddings, which are then used as input for downstream classi cation tasks. There has been considerable work on semi-supervised approaches to learning on graphs (Yang et al, 2016; Kipf and Welling, 2017; Veličković et al, 2018; Thekumparampil et al, 2018), including some that exploit connections with Schur complements (Vattani et al, 2011; Wagner et al, 2018; Viswanathan et al, 2019). Our techniques have direct connections with multilevel and multiscale algorithms, which aim to use a smaller version of a problem (typically on matrices or graphs) to generate answers that can be extended to the full problem (Chen et al, 2018; Liang et al, 2018; Abu-El-Haija et al, 2019). There exist wellknown connection between Schur complements, random contractions, and ner grids in the multigrid literature (Briggs et al, 2000). These connections have been utilized for e ciently solving Laplacian linear systems (Kyng and Sachdeva, 2016; Kyng et al, 2016), via provable spectral approximations to the Schur complement. However, approximations constructed using these algorithms have many more edges (by at least a factor of 1/ 2) than the original graph, limiting the practical applicability of these works. On the other hand, our work introduces Schur complements in the context of graph embeddings, and gives a simple random contraction rule that leads to a decrease in the edge count in the contracted graph, preserves Schur complements in expectation in each step, and performs well in practice. Graph compression techniques aimed at reducing the number of vertices have been studied for other graph primitives, including cuts/ ows (Moitra, 2009; Englert et al, 2014) and shortest path distances (Thorup and Zwick, 2005). However, the main objective of these works is to construct sparsi ers with theoretical guarantees and to the best of our knowledge, there are no works that consider their practical applicability.
Funding
  • MF did part of this work while supported by an NSF Graduate Research Fellowship under grant DGE1650044 at the Georgia Institute of Technology
  • SS and GG are partly supported by an NSERC Discovery grant awarded to SS by NSERC (Natural Sciences and Engineering Research Council of Canada)
  • RP did part of this work while at Microsoft Research Redmond, and is partially supported by the NSF under grants CCF-1637566 and CCF-1846218
Reference
  • Abu-El-Haija, S., Perozzi, B., Kapoor, A., Alipourfard, N., Lerman, K., Harutyunyan, H., Steeg, G. V., and Galstyan, A. (2019). MixHop: Higher-order graph convolutional architectures via sparsi ed neighborhood mixing. In Proceedings of the 36th International Conference on Machine Learning, pages 21–29. PMLR.
    Google ScholarLocate open access versionFindings
  • Batson, J., Spielman, D. A., Srivastava, N., and Teng, S.-H. (2013). Spectral sparsi cation of graphs: Theory and algorithms. Communications of the ACM, 56(8):87–94.
    Google ScholarLocate open access versionFindings
  • Benson, A. and Kleinberg, J. (2019). Link prediction in networks with core-fringe data. In Proceedings of the 28th International Conference on World Wide Web, pages 94–104. ACM.
    Google ScholarLocate open access versionFindings
  • Borgatti, S. P. and Everett, M. G. (2000). Models of core/periphery structures. Social networks, 21(4):375–395.
    Google ScholarLocate open access versionFindings
  • Briggs, W. L., Henson, V. E., and McCormick, S. F. (2000). A Multigrid Tutorial. SIAM. Bringmann, K. and Panagiotou, K. (2017). E cient sampling methods for discrete distributions. Algorithmica, 79(2):484–508.
    Google ScholarLocate open access versionFindings
  • Bruna, J., Zaremba, W., Szlam, A., and LeCun, Y. (2014). Spectral networks and locally connected networks on graphs. In Proceedings of the 2nd International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • Cao, S., Lu, W., and Xu, Q. (2015). GraRep: Learning graph representations with global structural information. In Proceedings of the 24th ACM International Conference on Information and Knowledge Management, pages 891–900.
    Google ScholarLocate open access versionFindings
  • Chen, H., Perozzi, B., Hu, Y., and Skiena, S. (2018). HARP: Hierarchical representation learning for networks. In Proceedings of the Thirty-Second AAAI Conference on Arti cial Intelligence, pages 2127–2134.
    Google ScholarLocate open access versionFindings
  • Cheng, D., Cheng, Y., Liu, Y., Peng, R., and Teng, S.-H. (2015). E cient sampling for Gaussian graphical models via spectral sparsi cation. In Proceedings of The 28th Conference on Learning Theory (COLT), pages 364–390. PMLR.
    Google ScholarLocate open access versionFindings
  • Chevalier, C. and Safro, I. (2009). Comparison of coarsening schemes for multilevel graph partitioning. In International Conference on Learning and Intelligent Optimization, pages 191–205. Springer.
    Google ScholarLocate open access versionFindings
  • Cummings, R., Fahrbach, M., and Fatehpuria, A. (2019). A fast minimum degree algorithm and matching lower bound. arXiv preprint arXiv:1907.12119.
    Findings
  • Englert, M., Gupta, A., Krauthgamer, R., Räcke, H., Talgam-Cohen, I., and Talwar, K. (2014). Vertex sparsi ers: New results from old techniques. SIAM J. Comput., 43(4):1239–1262.
    Google ScholarLocate open access versionFindings
  • Fahrbach, M., Miller, G. L., Peng, R., Sawlani, S., Wang, J., and Xu, S. C. (2018). Graph sketching against adaptive adversaries applied to the minimum degree algorithm. In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), pages 101–112. IEEE.
    Google ScholarLocate open access versionFindings
  • George, A. and Liu, J. W. H. (1989). The evolution of the minimum degree ordering algorithm. SIAM Review, 31(1):1–19.
    Google ScholarLocate open access versionFindings
  • Grover, A. and Leskovec, J. (2016). Node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, page 855–864.
    Google ScholarLocate open access versionFindings
  • Hamilton, W. L., Ying, Z., and Leskovec, J. (2017). Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems, pages 1024–1034.
    Google ScholarLocate open access versionFindings
  • Jia, J. and Benson, A. R. (2019). Random spatial network models for core-periphery structure. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, pages 366–374. ACM.
    Google ScholarLocate open access versionFindings
  • Karypis, G. and Kumar, V. (1998). A software package for partitioning unstructured graphs, partitioning meshes, and computing ll-reducing orderings of sparse matrices.
    Google ScholarFindings
  • Kipf, T. N. and Welling, M. (2017). Semi-supervised classi cation with graph convolutional networks. In Proceedings of the 5th International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • Kyng, R., Lee, Y. T., Peng, R., Sachdeva, S., and Spielman, D. A. (2016). Sparsi ed cholesky and multigrid solvers for connection laplacians. In Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 842–850.
    Google ScholarLocate open access versionFindings
  • Kyng, R. and Sachdeva, S. (2016). Approximate gaussian elimination for laplacians - fast, sparse, and simple. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 573–582.
    Google ScholarLocate open access versionFindings
  • Leskovec, J. and Krevl, A. (2014). SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data.
    Findings
  • Liang, J., Gurukar, S., and Parthasarathy, S. (2018). MILE: A multi-level framework for scalable graph embedding. arXiv preprint arXiv:1802.09612.
    Findings
  • Liu, Y., Dighe, A., Safavi, T., and Koutra, D. (2016). A graph summarization: A survey. CoRR, abs/1612.04883.
    Findings
  • Loukas, A. and Vandergheynst, P. (2018). Spectrally approximating large graphs with smaller graphs. In Proceedings of the 35th International Conference on Machine Learning, pages 3237–3246. PMLR. Moitra, A. (2009). Approximation algorithms for multicommodity-type problems with guarantees independent of the graph size. In 50th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 3–12.
    Google ScholarLocate open access versionFindings
  • Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830.
    Google ScholarLocate open access versionFindings
  • Perozzi, B., Al-Rfou, R., and Skiena, S. (2014). Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 701–710. ACM.
    Google ScholarLocate open access versionFindings
  • Qiu, J., Dong, Y., Ma, H., Li, J., Wang, C., Wang, K., and Tang, J. (2019). NetSMF: Large-scale network embedding as sparse matrix factorization. In Proceedings of the 28th International Conference on World Wide Web, pages 1509–1520. ACM.
    Google ScholarLocate open access versionFindings
  • Qiu, J., Dong, Y., Ma, H., Li, J., Wang, K., and Tang, J. (2018). Network embedding as matrix factorization: Unifying deepwalk, line, pte, and node2vec. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pages 459–467.
    Google ScholarLocate open access versionFindings
  • Stark, C., Breitkreutz, B.-J., Reguly, T., Boucher, L., Breitkreutz, A., and Tyers, M. (2006). BioGRID: A general repository for interaction datasets. Nucleic Acids Research, 34:D535–D539.
    Google ScholarLocate open access versionFindings
  • Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., and Mei, Q. (2015). LINE: Large-scale information network embedding. In Proceedings of the 24th International Conference on World Wide Web, pages 1067–1077.
    Google ScholarLocate open access versionFindings
  • Tang, L. and Liu, H. (2009). Relational learning via latent social dimensions. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 817–826. ACM.
    Google ScholarLocate open access versionFindings
  • Tang, L. and Liu, H. (2011). Leveraging social media networks for classi cation. Data Mining and Knowledge Discovery, 23(3):447–478.
    Google ScholarLocate open access versionFindings
  • Thekumparampil, K. K., Wang, C., Oh, S., and Li, L.-J. (2018). Attention-based graph neural network for semi-supervised learning. arXiv preprint arXiv:1803.03735.
    Findings
  • Thorup, M. and Zwick, U. (2005). Approximate distance oracles. J. ACM, 52(1):1–24.
    Google ScholarLocate open access versionFindings
  • Vattani, A., Chakrabarti, D., and Gurevich, M. (2011). Preserving personalized pagerank in subgraphs. In Proceedings of the 28th International Conference on Machine Learning, page 793–800.
    Google ScholarLocate open access versionFindings
  • Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., and Bengio, Y. (2018). Graph attention networks. In Proceedings of the 6th International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • Viswanathan, K., Sachdeva, S., Tomkins, A., and Ravi, S. (2019). Improved semi-supervised learning with multiple graphs. In The 22nd International Conference on Arti cial Intelligence and Statistics, pages 3032–3041.
    Google ScholarLocate open access versionFindings
  • Wagner, T., Guha, S., Kasiviswanathan, S., and Mishra, N. (2018). Semi-supervised learning on data streams via temporal label propagation. In Proceedings of the 35th International Conference on Machine Learning, pages 5095–5104.
    Google ScholarLocate open access versionFindings
  • Yang, J. and Leskovec, J. (2015). De ning and evaluating network communities based on ground-truth. Knowledge and Information Systems, 42(1):181–213.
    Google ScholarLocate open access versionFindings
  • Yang, Z., W. Cohen, W., and Salakhutdinov, R. (2016). Revisiting semi-supervised learning with graph embeddings. In Proceedings of the 33rd International Conference on Machine Learning, pages 40–48.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments