# Scalable Global Alignment Graph Kernel Using Random Features: From Node Embedding to Graph Embedding

pp. 1418-1428, 2019.

EI

Weibo:

Abstract:

Graph kernels are widely used for measuring the similarity between graphs. Many existing graph kernels, which focus on local patterns within graphs rather than their global properties, suffer from significant structure information loss when representing graphs. Some recent global graph kernels, which utilizes the alignment of geometric no...More

Code:

Data:

Introduction

- Graph kernels are one of the most important methods for graph data analysis and have been successfully applied in diverse fields such as disease and brain analysis [6, 21], chemical analysis [25], image action recognition and scene modeling [8, 37], and malware analysis[36].
- Much effort has been devoted to designing feature spaces or kernel functions for capturing similarities between structural properties of graphs.
- The first line of research focuses on local patterns within graphs [9, 28]
- These kernels recursively decompose the graphs into small sub-structures, and define a feature map over these sub-structures for the resulting graph kernel.
- Most of these graph kernels scale poorly to large graphs due to their at least quadratic time complexity in terms of the number of graphs and cubic time complexity in terms of the size of graphs

Highlights

- Graph kernels are one of the most important methods for graph data analysis and have been successfully applied in diverse fields such as disease and brain analysis [6, 21], chemical analysis [25], image action recognition and scene modeling [8, 37], and malware analysis[36]
- These global graph kernels based on matching node embeddings between graphs may suffer from the loss of positive definiteness. The majority of these approaches have at least quadratic complexity in terms of either the number of graph samples or the size of the graph. To address these limitations of existing graph kernels, we propose a new family of global graph kernels that take into account the global properties of graphs, based on recent advances in the distance kernel learning framework [42]
- We propose a class of p.d. global alignment graph kernels based on their global properties derived from geometric node embeddings and the corresponding node transportation
- By efficiently approximating the proposed global alignment graph kernel using random graph embeddings" (RGE), we obtain the benefits of both improved accuracy and reduced computational complexity
- We have presented a new family of p.d. and scalable global graph kernels that take into account global properties of graphs
- The benefits of RGE are demonstrated by its much higher graph classification accuracy compared with other graph kernels and itslinear scalability in terms of the number of graphs and graph size

Methods

- The authors performed experiments to demonstrate the effectiveness and efficiency of the proposed method, and compared against a total of twelve graph kernels and deep graph neural networks on nine benchmark datasets3 widely used for testing the performance of graph kernels.
- The authors applied the method to widely-used graph classification benchmarks from multiple domains [29, 34, 44]; MUTAG, PTC-MR, ENZYMES, PROTEINS, NCI1, and NCI109 are graphs derived from small molecules and macromolecules, and IMDBB, IMDB-M, and COLLAB are derived from social networks.
- All bioinformatics graph datasets have node labels while all other social network graphs have no node labels
- Detailed descriptions of these 9 datasets, including statistical properties, are provided in the Appendix

Results

- The authors perform experiments to demonstrate the effectiveness and efficiency of the proposed method, and compare against total 12 graph kernels and deep graph neural networks on 9 benchmark datasets 6 that is widely used for testing the performance of graph kernels.
- The authors use multithreading with total 12 threads in all experiments.
- All computations were carried out on a DELL dual socket system with Intel Xeon processors 272 at 2.93GHz for a total of 16 cores and 250 GB of memory, running the SUSE Linux operating system.

Conclusion

- The authors have presented a new family of p.d. and scalable global graph kernels that take into account global properties of graphs.
- The benefits of RGE are demonstrated by its much higher graph classification accuracy compared with other graph kernels and itslinear scalability in terms of the number of graphs and graph size.
- RGE kernel for graphs with continuous node attributes and edge attributes should be explored
- Several interesting directions for future work are indicated: i) the graph embeddings generated by the technique can be applied and generalized to other learning problems such as graph matching or searching; ii) extensions of the

Summary

## Introduction:

Graph kernels are one of the most important methods for graph data analysis and have been successfully applied in diverse fields such as disease and brain analysis [6, 21], chemical analysis [25], image action recognition and scene modeling [8, 37], and malware analysis[36].- Much effort has been devoted to designing feature spaces or kernel functions for capturing similarities between structural properties of graphs.
- The first line of research focuses on local patterns within graphs [9, 28]
- These kernels recursively decompose the graphs into small sub-structures, and define a feature map over these sub-structures for the resulting graph kernel.
- Most of these graph kernels scale poorly to large graphs due to their at least quadratic time complexity in terms of the number of graphs and cubic time complexity in terms of the size of graphs
## Objectives:

The authors' goal is to measure the similarity between a pair of graphs (Gi , Gj ) using a proper distance measure.## Methods:

The authors performed experiments to demonstrate the effectiveness and efficiency of the proposed method, and compared against a total of twelve graph kernels and deep graph neural networks on nine benchmark datasets3 widely used for testing the performance of graph kernels.- The authors applied the method to widely-used graph classification benchmarks from multiple domains [29, 34, 44]; MUTAG, PTC-MR, ENZYMES, PROTEINS, NCI1, and NCI109 are graphs derived from small molecules and macromolecules, and IMDBB, IMDB-M, and COLLAB are derived from social networks.
- All bioinformatics graph datasets have node labels while all other social network graphs have no node labels
- Detailed descriptions of these 9 datasets, including statistical properties, are provided in the Appendix
## Results:

The authors perform experiments to demonstrate the effectiveness and efficiency of the proposed method, and compare against total 12 graph kernels and deep graph neural networks on 9 benchmark datasets 6 that is widely used for testing the performance of graph kernels.- The authors use multithreading with total 12 threads in all experiments.
- All computations were carried out on a DELL dual socket system with Intel Xeon processors 272 at 2.93GHz for a total of 16 cores and 250 GB of memory, running the SUSE Linux operating system.
## Conclusion:

The authors have presented a new family of p.d. and scalable global graph kernels that take into account global properties of graphs.- The benefits of RGE are demonstrated by its much higher graph classification accuracy compared with other graph kernels and itslinear scalability in terms of the number of graphs and graph size.
- RGE kernel for graphs with continuous node attributes and edge attributes should be explored
- Several interesting directions for future work are indicated: i) the graph embeddings generated by the technique can be applied and generalized to other learning problems such as graph matching or searching; ii) extensions of the

- Table1: Comparison of classification accuracy against graph kernel methods without node labels
- Table2: Comparison of classification accuracy against graph kernel methods with node labels or WL technique
- Table3: Comparison of classification accuracy against recent deep learning models on graphs
- Table4: Properties of the datasets

Related work

- In this section, we first make a brief survey of the existing graph kernels and then detail the difference between conventional random features method for vector inputs [24] and our random features method for structured inputs.

2.1 Graph Kernels

Generally speaking, we can categorize the existing graph kernels into two groups: kernels based on local sub-structures, and kernels based on global properties.

The first group of graph kernels compare sub-structures of graphs, following a general kernel-learning framework, i.e., R-convolution for discrete objects [10]. The major difference among these graph kernels is rooted in how they define and explore sub-structures to define a graph kernel, including random walks [9], shortest paths [4], cycles [12], subtree patterns [28], and graphlets [30]. A thread of research attempts to utilize node label information using the Weisfeiler-Leman (WL) test of isomorphism [29] and takes structural similarity between sub-structures into account [44, 45] to further improve the performance of kernels.

Recently, a new class of graph kernels, which focus on the use of geometric node embeddings of graph to capture global properties, are proposed. These kernels have achieved state-of-the-art performance in the graph classification task [14, 15, 23]. The first global kernel was based on the Lovász number [20] and its associated orthonormal representation [14]. However, these kernels can only be applied on unlabelled graphs. Later approaches directly learn graph embeddings by using landmarks [15] or compute a similarity matrix [23] by exploiting different matching schemes between geometric embeddings of nodes of a pair of graphs. Unfortunately, the resulting kernel matrix does not yield a valid p.d. kernel and thus delivers a serious blow to hopes of using kernel support machine. Two recent graph kernels, the multiscale laplacian kernel [16] and optimal assignment kernel [17] were developed to overcome these limitations by building a p.d. kernel between node distributions or histogram intersection.

Reference

- Rami Al-Rfou, Dustin Zelle, and Bryan Perozzi. 2019. DDGK: Learning Graph Representations for Deep Divergence Graph Kernels. arXiv:1904.09671 (2019).
- James Atwood, Siddharth Pal, Don Towsley, and Ananthram Swami. 2016. Sparse Diffusion-Convolutional Neural Networks. In NIPS.
- Francis Bach. 2017. On the equivalence between kernel quadrature rules and random feature expansions. Journal of Machine Learning Research 18, 21 (2017), 1–38.
- Karsten M Borgwardt and Hans-Peter Kriegel. 2005. Shortest-path kernels on graphs. In Data Mining, Fifth IEEE International Conference on. IEEE, 8–pp.
- François Bourgeois and Jean-Claude Lassalle. 1971. An extension of the Munkres algorithm for the assignment problem to rectangular matrices. Commun. ACM 14, 12 (1971), 802–804.
- Pin-Yu Chen and Lingfei Wu. 2017. Revisiting spectral graph clustering with generative community models. In ICDM. 51–60.
- Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR: A library for large linear classification. Journal of machine learning research 9, Aug (2008), 1871–1874.
- Matthew Fisher, Manolis Savva, and Pat Hanrahan. 2011. Characterizing structural relationships in scenes using graph kernels. ACM Transactions on Graphics (TOG) 30, 4 (2011), 34.
- Thomas Gärtner, Peter Flach, and Stefan Wrobel. 2003. On graph kernels: Hardness results and efficient alternatives. In Learning Theory and Kernel Machines. Springer, 129–143.
- David Haussler. 1999. Convolution kernels on discrete structures. Technical Report. Department of Computer Science, University of California at Santa Cruz.
- Frank L Hitchcock. 1941. The distribution of a product from several sources to numerous localities. Studies in Applied Mathematics 20, 1-4 (1941), 224–230.
- Tamás Horváth, Thomas Gärtner, and Stefan Wrobel. 2004. Cyclic pattern kernels for predictive graph mining. In KDD. ACM, 158–167.
- Catalin Ionescu, Alin Popa, and Cristian Sminchisescu. 2017. Large-scale datadependent kernel approximation. In Artificial Intelligence and Statistics. 19–27.
- Fredrik Johansson, Vinay Jethava, Devdatt Dubhashi, and Chiranjib Bhattacharyya. 20Global graph kernels using geometric embeddings. In ICML.
- Fredrik D Johansson and Devdatt Dubhashi. 20Learning with similarity functions on graphs using matchings of geometric embeddings. In KDD. ACM, 467–476.
- Risi Kondor and Horace Pan. 20The multiscale laplacian graph kernel. In NIPS. 2990–2998.
- Nils M Kriege, Pierre-Louis Giscard, and Richard Wilson. 2016. On valid optimal assignment kernels and applications to graph classification. In NIPS. 1623–1631.
- Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Weinberger. 2015. From word embeddings to document distances. In ICML. 957–966.
- Quoc Le, Tamás Sarlós, and Alex Smola. 2013. Fastfood-approximating kernel expansions in loglinear time. In ICML, Vol. 85.
- László Lovász. 1979. On the Shannon capacity of a graph. IEEE Transactions on Information theory 25, 1 (1979), 1–7.
- Fatemeh Mokhtari and Gholam-Ali Hossein-Zadeh. 2013. Decoding brain states using backward edge elimination and graph kernels in fMRI connectivity networks. Journal of neuroscience methods 212, 2 (2013), 259–268.
- Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. 2016. Learning convolutional neural networks for graphs. In ICML. 2014–2023.
- Giannis Nikolentzos, Polykarpos Meladianos, and Michalis Vazirgiannis. 2017. Matching Node Embeddings for Graph Similarity.. In AAAI. 2429–2435.
- Ali Rahimi and Benjamin Recht. 2008. Random features for large-scale kernel machines. In NIPS. 1177–1184.
- Liva Ralaivola, Sanjay J Swamidass, Hiroto Saigo, and Pierre Baldi. 2005. Graph kernels for chemical informatics. Neural networks 18, 8 (2005), 1093–1110.
- Yossi Rubner, Carlo Tomasi, and Leonidas J Guibas. 2000. The earth mover’s distance as a metric for image retrieval. International journal of computer vision 40, 2 (2000), 99–121.
- Alessandro Rudi and Lorenzo Rosasco. 2017. Generalization properties of learning with random features. In NIPS. 3218–3228.
- Nino Shervashidze and Karsten M Borgwardt. 2009. Fast subtree kernels on graphs. In NIPS. 1660–1668.
- Nino Shervashidze, Pascal Schweitzer, Erik Jan van Leeuwen, Kurt Mehlhorn, and Karsten M Borgwardt. 2011. Weisfeiler-lehman graph kernels. Journal of Machine Learning Research 12, Sep (2011), 2539–2561.
- Nino Shervashidze, SVN Vishwanathan, Tobias Petri, Kurt Mehlhorn, and Karsten Borgwardt. 2009. Efficient graphlet kernels for large graph comparison. In AIStats. 488–495.
- Aman Sinha and John C Duchi. 2016. Learning kernels with random features. In NIPS. 1298–1306.
- Justin Solomon, Raif Rustamov, Leonidas Guibas, and Adrian Butscher. 2016. Continuous-flow graph transportation distances. arXiv:1603.06927 (2016).
- Andreas Stathopoulos and James R McCombs. 2010. PRIMME: preconditioned iterative multimethod eigensolverâĂŤmethods and software description. ACM Transactions on Mathematical Software (TOMS) 37, 2 (2010), 21.
- S Vichy N Vishwanathan, Nicol N Schraudolph, Risi Kondor, and Karsten M Borgwardt. 2010. Graph kernels. Journal of Machine Learning Research 11 (2010), 1201–1242.
- Ulrike Von Luxburg. 2007. A tutorial on spectral clustering. Statistics and computing 17, 4 (2007), 395–416.
- Cynthia Wagner, Gerard Wagener, Radu State, and Thomas Engel. 2009. Malware analysis with graph kernels and support vector machines. In Malicious and Unwanted Software, 2009 4th International Conference on. IEEE, 63–68.
- Ling Wang and Hichem Sahbi. 2013. Directed acyclic graph kernels for action recognition. In ICCV. IEEE, 3168–3175.
- Bo Wu, Yang Liu, Bo Lang, and Lei Huang. 2017. DGCNN: Disordered Graph Convolutional Neural Network Based on the Gaussian Mixture Model. arXiv:1712.03563 (2017).
- Lingfei Wu, Pin-Yu Chen, Ian En-Hsu Yen, Fangli Xu, Yinglong Xia, and Charu Aggarwal. 2018. Scalable spectral clustering using random binning features. In KDD. ACM, 2506–2515.
- Lingfei Wu, Eloy Romero, and Andreas Stathopoulos. 2017. PRIMME_SVDS: A high-performance preconditioned SVD solver for accurate large-scale computations. SIAM Journal on Scientific Computing 39, 5 (2017), S248–S271.
- Lingfei Wu, Ian EH Yen, Jie Chen, and Rui Yan. 2016. Revisiting random binning features: Fast convergence and strong parallelizability. In KDD. ACM, 1265–1274.
- Lingfei Wu, Ian En-Hsu Yen, Fangli Xu, Pradeep Ravikuma, and Michael Witbrock. 2018. D2KE: From Distance to Kernel and Embedding. arXiv preprint arXiv:1802.04956 (2018).
- Lingfei Wu, Ian En-Hsu Yen, Jinfeng Yi, Fangli Xu, Qi Lei, and Michael Witbrock.
- 2018. Random Warping Series: A Random Features Method for Time-Series Embedding. In International Conference on Artificial Intelligence and Statistics. 793–802.
- [44] Pinar Yanardag and SVN Vishwanathan. 2015. Deep graph kernels. In KDD. ACM, 1365–1374.
- [45] Pinar Yanardag and SVN Vishwanathan. 2015. A structural smoothing framework for robust graph comparison. In NIPS. 2134–2142.
- [46] Zhen Zhang, Mianzhi Wang, Yijian Xiang, Yan Huang, and Arye Nehorai. 2018. RetGK: Graph Kernels based on Return Probabilities of Random Walks. In NIPS. 3968–3978. http://ai.stanford.edu/rubner/emd/default.htm

Full Text

Tags

Comments