Scalable Global Alignment Graph Kernel Using Random Features: From Node Embedding to Graph Embedding

pp. 1418-1428, 2019.

Cited by: 9|Bibtex|Views89|DOI:https://doi.org/10.1145/3292500.3330918
EI
Other Links: dl.acm.org|dblp.uni-trier.de|academic.microsoft.com|arxiv.org
Weibo:
We have presented a new family of p.d. and scalable global graph kernels that take into account global properties of graphs

Abstract:

Graph kernels are widely used for measuring the similarity between graphs. Many existing graph kernels, which focus on local patterns within graphs rather than their global properties, suffer from significant structure information loss when representing graphs. Some recent global graph kernels, which utilizes the alignment of geometric no...More

Code:

Data:

0
Introduction
  • Graph kernels are one of the most important methods for graph data analysis and have been successfully applied in diverse fields such as disease and brain analysis [6, 21], chemical analysis [25], image action recognition and scene modeling [8, 37], and malware analysis[36].
  • Much effort has been devoted to designing feature spaces or kernel functions for capturing similarities between structural properties of graphs.
  • The first line of research focuses on local patterns within graphs [9, 28]
  • These kernels recursively decompose the graphs into small sub-structures, and define a feature map over these sub-structures for the resulting graph kernel.
  • Most of these graph kernels scale poorly to large graphs due to their at least quadratic time complexity in terms of the number of graphs and cubic time complexity in terms of the size of graphs
Highlights
  • Graph kernels are one of the most important methods for graph data analysis and have been successfully applied in diverse fields such as disease and brain analysis [6, 21], chemical analysis [25], image action recognition and scene modeling [8, 37], and malware analysis[36]
  • These global graph kernels based on matching node embeddings between graphs may suffer from the loss of positive definiteness. The majority of these approaches have at least quadratic complexity in terms of either the number of graph samples or the size of the graph. To address these limitations of existing graph kernels, we propose a new family of global graph kernels that take into account the global properties of graphs, based on recent advances in the distance kernel learning framework [42]
  • We propose a class of p.d. global alignment graph kernels based on their global properties derived from geometric node embeddings and the corresponding node transportation
  • By efficiently approximating the proposed global alignment graph kernel using random graph embeddings" (RGE), we obtain the benefits of both improved accuracy and reduced computational complexity
  • We have presented a new family of p.d. and scalable global graph kernels that take into account global properties of graphs
  • The benefits of RGE are demonstrated by its much higher graph classification accuracy compared with other graph kernels and itslinear scalability in terms of the number of graphs and graph size
Methods
  • The authors performed experiments to demonstrate the effectiveness and efficiency of the proposed method, and compared against a total of twelve graph kernels and deep graph neural networks on nine benchmark datasets3 widely used for testing the performance of graph kernels.
  • The authors applied the method to widely-used graph classification benchmarks from multiple domains [29, 34, 44]; MUTAG, PTC-MR, ENZYMES, PROTEINS, NCI1, and NCI109 are graphs derived from small molecules and macromolecules, and IMDBB, IMDB-M, and COLLAB are derived from social networks.
  • All bioinformatics graph datasets have node labels while all other social network graphs have no node labels
  • Detailed descriptions of these 9 datasets, including statistical properties, are provided in the Appendix
Results
  • The authors perform experiments to demonstrate the effectiveness and efficiency of the proposed method, and compare against total 12 graph kernels and deep graph neural networks on 9 benchmark datasets 6 that is widely used for testing the performance of graph kernels.
  • The authors use multithreading with total 12 threads in all experiments.
  • All computations were carried out on a DELL dual socket system with Intel Xeon processors 272 at 2.93GHz for a total of 16 cores and 250 GB of memory, running the SUSE Linux operating system.
Conclusion
  • The authors have presented a new family of p.d. and scalable global graph kernels that take into account global properties of graphs.
  • The benefits of RGE are demonstrated by its much higher graph classification accuracy compared with other graph kernels and itslinear scalability in terms of the number of graphs and graph size.
  • RGE kernel for graphs with continuous node attributes and edge attributes should be explored
  • Several interesting directions for future work are indicated: i) the graph embeddings generated by the technique can be applied and generalized to other learning problems such as graph matching or searching; ii) extensions of the
Summary
  • Introduction:

    Graph kernels are one of the most important methods for graph data analysis and have been successfully applied in diverse fields such as disease and brain analysis [6, 21], chemical analysis [25], image action recognition and scene modeling [8, 37], and malware analysis[36].
  • Much effort has been devoted to designing feature spaces or kernel functions for capturing similarities between structural properties of graphs.
  • The first line of research focuses on local patterns within graphs [9, 28]
  • These kernels recursively decompose the graphs into small sub-structures, and define a feature map over these sub-structures for the resulting graph kernel.
  • Most of these graph kernels scale poorly to large graphs due to their at least quadratic time complexity in terms of the number of graphs and cubic time complexity in terms of the size of graphs
  • Objectives:

    The authors' goal is to measure the similarity between a pair of graphs (Gi , Gj ) using a proper distance measure.
  • Methods:

    The authors performed experiments to demonstrate the effectiveness and efficiency of the proposed method, and compared against a total of twelve graph kernels and deep graph neural networks on nine benchmark datasets3 widely used for testing the performance of graph kernels.
  • The authors applied the method to widely-used graph classification benchmarks from multiple domains [29, 34, 44]; MUTAG, PTC-MR, ENZYMES, PROTEINS, NCI1, and NCI109 are graphs derived from small molecules and macromolecules, and IMDBB, IMDB-M, and COLLAB are derived from social networks.
  • All bioinformatics graph datasets have node labels while all other social network graphs have no node labels
  • Detailed descriptions of these 9 datasets, including statistical properties, are provided in the Appendix
  • Results:

    The authors perform experiments to demonstrate the effectiveness and efficiency of the proposed method, and compare against total 12 graph kernels and deep graph neural networks on 9 benchmark datasets 6 that is widely used for testing the performance of graph kernels.
  • The authors use multithreading with total 12 threads in all experiments.
  • All computations were carried out on a DELL dual socket system with Intel Xeon processors 272 at 2.93GHz for a total of 16 cores and 250 GB of memory, running the SUSE Linux operating system.
  • Conclusion:

    The authors have presented a new family of p.d. and scalable global graph kernels that take into account global properties of graphs.
  • The benefits of RGE are demonstrated by its much higher graph classification accuracy compared with other graph kernels and itslinear scalability in terms of the number of graphs and graph size.
  • RGE kernel for graphs with continuous node attributes and edge attributes should be explored
  • Several interesting directions for future work are indicated: i) the graph embeddings generated by the technique can be applied and generalized to other learning problems such as graph matching or searching; ii) extensions of the
Tables
  • Table1: Comparison of classification accuracy against graph kernel methods without node labels
  • Table2: Comparison of classification accuracy against graph kernel methods with node labels or WL technique
  • Table3: Comparison of classification accuracy against recent deep learning models on graphs
  • Table4: Properties of the datasets
Download tables as Excel
Related work
  • In this section, we first make a brief survey of the existing graph kernels and then detail the difference between conventional random features method for vector inputs [24] and our random features method for structured inputs.

    2.1 Graph Kernels

    Generally speaking, we can categorize the existing graph kernels into two groups: kernels based on local sub-structures, and kernels based on global properties.

    The first group of graph kernels compare sub-structures of graphs, following a general kernel-learning framework, i.e., R-convolution for discrete objects [10]. The major difference among these graph kernels is rooted in how they define and explore sub-structures to define a graph kernel, including random walks [9], shortest paths [4], cycles [12], subtree patterns [28], and graphlets [30]. A thread of research attempts to utilize node label information using the Weisfeiler-Leman (WL) test of isomorphism [29] and takes structural similarity between sub-structures into account [44, 45] to further improve the performance of kernels.

    Recently, a new class of graph kernels, which focus on the use of geometric node embeddings of graph to capture global properties, are proposed. These kernels have achieved state-of-the-art performance in the graph classification task [14, 15, 23]. The first global kernel was based on the Lovász number [20] and its associated orthonormal representation [14]. However, these kernels can only be applied on unlabelled graphs. Later approaches directly learn graph embeddings by using landmarks [15] or compute a similarity matrix [23] by exploiting different matching schemes between geometric embeddings of nodes of a pair of graphs. Unfortunately, the resulting kernel matrix does not yield a valid p.d. kernel and thus delivers a serious blow to hopes of using kernel support machine. Two recent graph kernels, the multiscale laplacian kernel [16] and optimal assignment kernel [17] were developed to overcome these limitations by building a p.d. kernel between node distributions or histogram intersection.
Reference
  • Rami Al-Rfou, Dustin Zelle, and Bryan Perozzi. 2019. DDGK: Learning Graph Representations for Deep Divergence Graph Kernels. arXiv:1904.09671 (2019).
    Findings
  • James Atwood, Siddharth Pal, Don Towsley, and Ananthram Swami. 2016. Sparse Diffusion-Convolutional Neural Networks. In NIPS.
    Google ScholarLocate open access versionFindings
  • Francis Bach. 2017. On the equivalence between kernel quadrature rules and random feature expansions. Journal of Machine Learning Research 18, 21 (2017), 1–38.
    Google ScholarLocate open access versionFindings
  • Karsten M Borgwardt and Hans-Peter Kriegel. 2005. Shortest-path kernels on graphs. In Data Mining, Fifth IEEE International Conference on. IEEE, 8–pp.
    Google ScholarLocate open access versionFindings
  • François Bourgeois and Jean-Claude Lassalle. 1971. An extension of the Munkres algorithm for the assignment problem to rectangular matrices. Commun. ACM 14, 12 (1971), 802–804.
    Google ScholarLocate open access versionFindings
  • Pin-Yu Chen and Lingfei Wu. 2017. Revisiting spectral graph clustering with generative community models. In ICDM. 51–60.
    Google ScholarLocate open access versionFindings
  • Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR: A library for large linear classification. Journal of machine learning research 9, Aug (2008), 1871–1874.
    Google ScholarLocate open access versionFindings
  • Matthew Fisher, Manolis Savva, and Pat Hanrahan. 2011. Characterizing structural relationships in scenes using graph kernels. ACM Transactions on Graphics (TOG) 30, 4 (2011), 34.
    Google ScholarLocate open access versionFindings
  • Thomas Gärtner, Peter Flach, and Stefan Wrobel. 2003. On graph kernels: Hardness results and efficient alternatives. In Learning Theory and Kernel Machines. Springer, 129–143.
    Google ScholarFindings
  • David Haussler. 1999. Convolution kernels on discrete structures. Technical Report. Department of Computer Science, University of California at Santa Cruz.
    Google ScholarFindings
  • Frank L Hitchcock. 1941. The distribution of a product from several sources to numerous localities. Studies in Applied Mathematics 20, 1-4 (1941), 224–230.
    Google ScholarLocate open access versionFindings
  • Tamás Horváth, Thomas Gärtner, and Stefan Wrobel. 2004. Cyclic pattern kernels for predictive graph mining. In KDD. ACM, 158–167.
    Google ScholarLocate open access versionFindings
  • Catalin Ionescu, Alin Popa, and Cristian Sminchisescu. 2017. Large-scale datadependent kernel approximation. In Artificial Intelligence and Statistics. 19–27.
    Google ScholarLocate open access versionFindings
  • Fredrik Johansson, Vinay Jethava, Devdatt Dubhashi, and Chiranjib Bhattacharyya. 20Global graph kernels using geometric embeddings. In ICML.
    Google ScholarFindings
  • Fredrik D Johansson and Devdatt Dubhashi. 20Learning with similarity functions on graphs using matchings of geometric embeddings. In KDD. ACM, 467–476.
    Google ScholarLocate open access versionFindings
  • Risi Kondor and Horace Pan. 20The multiscale laplacian graph kernel. In NIPS. 2990–2998.
    Google ScholarFindings
  • Nils M Kriege, Pierre-Louis Giscard, and Richard Wilson. 2016. On valid optimal assignment kernels and applications to graph classification. In NIPS. 1623–1631.
    Google ScholarFindings
  • Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Weinberger. 2015. From word embeddings to document distances. In ICML. 957–966.
    Google ScholarLocate open access versionFindings
  • Quoc Le, Tamás Sarlós, and Alex Smola. 2013. Fastfood-approximating kernel expansions in loglinear time. In ICML, Vol. 85.
    Google ScholarLocate open access versionFindings
  • László Lovász. 1979. On the Shannon capacity of a graph. IEEE Transactions on Information theory 25, 1 (1979), 1–7.
    Google ScholarLocate open access versionFindings
  • Fatemeh Mokhtari and Gholam-Ali Hossein-Zadeh. 2013. Decoding brain states using backward edge elimination and graph kernels in fMRI connectivity networks. Journal of neuroscience methods 212, 2 (2013), 259–268.
    Google ScholarLocate open access versionFindings
  • Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. 2016. Learning convolutional neural networks for graphs. In ICML. 2014–2023.
    Google ScholarFindings
  • Giannis Nikolentzos, Polykarpos Meladianos, and Michalis Vazirgiannis. 2017. Matching Node Embeddings for Graph Similarity.. In AAAI. 2429–2435.
    Google ScholarFindings
  • Ali Rahimi and Benjamin Recht. 2008. Random features for large-scale kernel machines. In NIPS. 1177–1184.
    Google ScholarFindings
  • Liva Ralaivola, Sanjay J Swamidass, Hiroto Saigo, and Pierre Baldi. 2005. Graph kernels for chemical informatics. Neural networks 18, 8 (2005), 1093–1110.
    Google ScholarLocate open access versionFindings
  • Yossi Rubner, Carlo Tomasi, and Leonidas J Guibas. 2000. The earth mover’s distance as a metric for image retrieval. International journal of computer vision 40, 2 (2000), 99–121.
    Google ScholarLocate open access versionFindings
  • Alessandro Rudi and Lorenzo Rosasco. 2017. Generalization properties of learning with random features. In NIPS. 3218–3228.
    Google ScholarFindings
  • Nino Shervashidze and Karsten M Borgwardt. 2009. Fast subtree kernels on graphs. In NIPS. 1660–1668.
    Google ScholarFindings
  • Nino Shervashidze, Pascal Schweitzer, Erik Jan van Leeuwen, Kurt Mehlhorn, and Karsten M Borgwardt. 2011. Weisfeiler-lehman graph kernels. Journal of Machine Learning Research 12, Sep (2011), 2539–2561.
    Google ScholarLocate open access versionFindings
  • Nino Shervashidze, SVN Vishwanathan, Tobias Petri, Kurt Mehlhorn, and Karsten Borgwardt. 2009. Efficient graphlet kernels for large graph comparison. In AIStats. 488–495.
    Google ScholarFindings
  • Aman Sinha and John C Duchi. 2016. Learning kernels with random features. In NIPS. 1298–1306.
    Google ScholarFindings
  • Justin Solomon, Raif Rustamov, Leonidas Guibas, and Adrian Butscher. 2016. Continuous-flow graph transportation distances. arXiv:1603.06927 (2016).
    Findings
  • Andreas Stathopoulos and James R McCombs. 2010. PRIMME: preconditioned iterative multimethod eigensolverâĂŤmethods and software description. ACM Transactions on Mathematical Software (TOMS) 37, 2 (2010), 21.
    Google ScholarLocate open access versionFindings
  • S Vichy N Vishwanathan, Nicol N Schraudolph, Risi Kondor, and Karsten M Borgwardt. 2010. Graph kernels. Journal of Machine Learning Research 11 (2010), 1201–1242.
    Google ScholarLocate open access versionFindings
  • Ulrike Von Luxburg. 2007. A tutorial on spectral clustering. Statistics and computing 17, 4 (2007), 395–416.
    Google ScholarLocate open access versionFindings
  • Cynthia Wagner, Gerard Wagener, Radu State, and Thomas Engel. 2009. Malware analysis with graph kernels and support vector machines. In Malicious and Unwanted Software, 2009 4th International Conference on. IEEE, 63–68.
    Google ScholarFindings
  • Ling Wang and Hichem Sahbi. 2013. Directed acyclic graph kernels for action recognition. In ICCV. IEEE, 3168–3175.
    Google ScholarLocate open access versionFindings
  • Bo Wu, Yang Liu, Bo Lang, and Lei Huang. 2017. DGCNN: Disordered Graph Convolutional Neural Network Based on the Gaussian Mixture Model. arXiv:1712.03563 (2017).
    Findings
  • Lingfei Wu, Pin-Yu Chen, Ian En-Hsu Yen, Fangli Xu, Yinglong Xia, and Charu Aggarwal. 2018. Scalable spectral clustering using random binning features. In KDD. ACM, 2506–2515.
    Google ScholarLocate open access versionFindings
  • Lingfei Wu, Eloy Romero, and Andreas Stathopoulos. 2017. PRIMME_SVDS: A high-performance preconditioned SVD solver for accurate large-scale computations. SIAM Journal on Scientific Computing 39, 5 (2017), S248–S271.
    Google ScholarLocate open access versionFindings
  • Lingfei Wu, Ian EH Yen, Jie Chen, and Rui Yan. 2016. Revisiting random binning features: Fast convergence and strong parallelizability. In KDD. ACM, 1265–1274.
    Google ScholarLocate open access versionFindings
  • Lingfei Wu, Ian En-Hsu Yen, Fangli Xu, Pradeep Ravikuma, and Michael Witbrock. 2018. D2KE: From Distance to Kernel and Embedding. arXiv preprint arXiv:1802.04956 (2018).
    Findings
  • Lingfei Wu, Ian En-Hsu Yen, Jinfeng Yi, Fangli Xu, Qi Lei, and Michael Witbrock.
    Google ScholarFindings
  • 2018. Random Warping Series: A Random Features Method for Time-Series Embedding. In International Conference on Artificial Intelligence and Statistics. 793–802.
    Google ScholarLocate open access versionFindings
  • [44] Pinar Yanardag and SVN Vishwanathan. 2015. Deep graph kernels. In KDD. ACM, 1365–1374.
    Google ScholarLocate open access versionFindings
  • [45] Pinar Yanardag and SVN Vishwanathan. 2015. A structural smoothing framework for robust graph comparison. In NIPS. 2134–2142.
    Google ScholarFindings
  • [46] Zhen Zhang, Mianzhi Wang, Yijian Xiang, Yan Huang, and Arye Nehorai. 2018. RetGK: Graph Kernels based on Return Probabilities of Random Walks. In NIPS. 3968–3978. http://ai.stanford.edu/rubner/emd/default.htm
    Findings
Full Text
Your rating :
0

 

Tags
Comments