# SimGNN - A Neural Network Approach to Fast Graph Similarity Computation

WSDM, pp. 384-392, 2019.

EI

Weibo:

Abstract:

Graph similarity search is among the most important graph-based applications, e.g. finding the chemical compounds that are most similar to a query compound. Graph similarity/distance computation, such as Graph Edit Distance (GED) and Maximum Common Subgraph (MCS), is the core operation of graph similarity search and many other application...More

Code:

Data:

Introduction

- Graphs are ubiquitous nowadays and have a wide range of applications in bioinformatics, chemistry, recommender systems, social network study, program static analysis, etc.
- The fundamental problem of the exponential time complexity of exact graph similarity computation [28] remains.
- Instead of calculating the exact similarity metric, these methods find approximate values in a fast and heuristic way [2, 6, 9, 28, 35]
- These methods usually require rather complicated design and implementation based on discrete optimization or combinatorial search.
- The time complexity is usually still polynomial or even sub-exponential in the number of nodes in the graphs, such as A*-Beamsearch (Beam) [28], Hungarian [35], VJ [9], etc

Highlights

- Graphs are ubiquitous nowadays and have a wide range of applications in bioinformatics, chemistry, recommender systems, social network study, program static analysis, etc
- Given the huge importance yet great difficulty of computing the exact graph distances, there have been two broad categories of methods to address the problem of graph similarity search
- The first category of remedies is the pruning-verification framework [26, 48, 49], under which the total amount of exact graph similarity computations for a query can be reduced to a tractable degree, via a series of database indexing techniques and pruning strategies
- We propose a novel attention mechanism to select the important nodes out of an entire graph with respect to a specific similarity metric
- We introduce our proposed approach SimGNN in detail, which is an end-to-end neural network based approach that attempts to learn a function to map a pair of graphs into a similarity score
- Our model runs very fast compared to existing classic algorithms on approximate Graph Edit Distance computation, and achieves very competitive accuracy

Methods

- SimGNN mse(10−3 ) 1.189 ρ 0.843 τ 0.690 p@10 0.421 p@20 1.509 ρ 0.939 τ

Results

**Evaluation Metrics garian**

Algorithm for bipartite graph matching, and the algorithm

The following metrics are used to evaluate all the models: Time.- (1) SimpleMean takes squared error measures the average squared difference between the the unweighted average of all the node embeddings of a graph computed similarities and the ground-truth similarities.
- Spearman’s Rank Correlation Coefficient (ρ) [39] and Kendall’s on graph coarsening, which use the global mean or max pooling.
- Rank Correlation Coefficient (τ ) [20] measure how well the predicted to generate a graph hierarchy.
- Compared the natural log of the degree of a node as its attention weight, as with p@k, ρ and τ evaluates the global ranking results instead of described in Section 3.1.2. (5) AttGlobalContext and (6) AttLearnfocusing on the top k results

Conclusion

- The authors are at the intersection of graph deep learning and graph search problem, and taking the first step towards bridging the gap, by tackling the core operation of graph similarity computation , via a novel neural network based approach.
- The central idea is to learn a neural network based function that is representation-invariant, inductive, and adaptive to the specific similarity metric, which takes any two graphs as input and outputs their similarity score.
- The authors' model runs very fast compared to existing classic algorithms on approximate Graph Edit Distance computation, and achieves very competitive accuracy

Summary

## Introduction:

Graphs are ubiquitous nowadays and have a wide range of applications in bioinformatics, chemistry, recommender systems, social network study, program static analysis, etc.- The fundamental problem of the exponential time complexity of exact graph similarity computation [28] remains.
- Instead of calculating the exact similarity metric, these methods find approximate values in a fast and heuristic way [2, 6, 9, 28, 35]
- These methods usually require rather complicated design and implementation based on discrete optimization or combinatorial search.
- The time complexity is usually still polynomial or even sub-exponential in the number of nodes in the graphs, such as A*-Beamsearch (Beam) [28], Hungarian [35], VJ [9], etc
## Methods:

SimGNN mse(10−3 ) 1.189 ρ 0.843 τ 0.690 p@10 0.421 p@20 1.509 ρ 0.939 τ## Results:

**Evaluation Metrics garian**

Algorithm for bipartite graph matching, and the algorithm

The following metrics are used to evaluate all the models: Time.- (1) SimpleMean takes squared error measures the average squared difference between the the unweighted average of all the node embeddings of a graph computed similarities and the ground-truth similarities.
- Spearman’s Rank Correlation Coefficient (ρ) [39] and Kendall’s on graph coarsening, which use the global mean or max pooling.
- Rank Correlation Coefficient (τ ) [20] measure how well the predicted to generate a graph hierarchy.
- Compared the natural log of the degree of a node as its attention weight, as with p@k, ρ and τ evaluates the global ranking results instead of described in Section 3.1.2. (5) AttGlobalContext and (6) AttLearnfocusing on the top k results
## Conclusion:

The authors are at the intersection of graph deep learning and graph search problem, and taking the first step towards bridging the gap, by tackling the core operation of graph similarity computation , via a novel neural network based approach.- The central idea is to learn a neural network based function that is representation-invariant, inductive, and adaptive to the specific similarity metric, which takes any two graphs as input and outputs their similarity score.
- The authors' model runs very fast compared to existing classic algorithms on approximate Graph Edit Distance computation, and achieves very competitive accuracy

- Table1: Statistics of datasets
- Table2: Results on AIDS
- Table3: Results on LINUX
- Table4: Results on IMDB. Beam, Hungarian, and VJ together are used to determine the ground-truth results

Related work

- 5.1 Network/Graph Embedding

Node-level embedding. Over the years, there are several categories of methods that have been proposed for learning node representations, including matrix factorization based methods (NetMF [32]), skip-gram based methods (DeepWalk [31], Node2Vec [12], LINE [40]), There are several directions to go for the future work: (1) our model can handle graphs with node types but cannot process edge features. In chemistry, bonds of a chemical compound are usually labeled, so it is useful to incorporate edge labels into our model; (2) it is promising to explore different techniques to further boost the precisions at the top k results, which is not preserved well mainly due to the skewed similarity distribution in the training dataset; and (3) given the constraint that the exact GEDs for large graphs cannot be computed, it would be interesting to see how the learned model generalize to large graphs, which is trained only on the exact GEDs between small graphs.

Funding

- The work is supported in part by NSF DBI 1565137, NSF DGE1829071, NSF III-1705169, NSF CAREER Award 1741634, NIH U01HG008488, NIH R01GM115833, Snapchat gift funds, and PPDai gift fund

Reference

- David B Blumenthal and Johann Gamper. 2018. On the exact computation of the graph edit distance. Pattern Recognition Letters (2018).
- Sebastien Bougleux, Luc Brun, Vincenzo Carletti, Pasquale Foggia, Benoit Gaüzère, and Mario Vento. 2017. Graph edit distance as a quadratic assignment problem. Pattern Recognition Letters 87 (2017), 38–46.
- H Bunke. 198What is the distance between graphs. Bulletin of the EATCS 20 (1983), 35–39.
- Horst Bunke. 1997. On a relation between graph edit distance and maximum common subgraph. Pattern Recognition Letters 18, 8 (1997), 689–694.
- Horst Bunke and Kim Shearer. 1998. A graph distance metric based on the maximal common subgraph. Pattern recognition letters 19, 3-4 (1998), 255–259.
- Évariste Daller, Sébastien Bougleux, Benoit Gaüzère, and Luc Brun. 2018. Approximate graph edit distance by several local searches in parallel. In ICPRAM.
- Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. In NIPS. 3844–3852.
- David K Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Alán Aspuru-Guzik, and Ryan P Adams. 2015. Convolutional networks on graphs for learning molecular fingerprints. In NIPS. 2224–2232.
- Stefan Fankhauser, Kaspar Riesen, and Horst Bunke. 2011. Speeding up graph edit distance computation through fast bipartite matching. In International Workshop on Graph-Based Representations in Pattern Recognition. Springer, 102–111.
- Andreas Fischer, Ching Y Suen, Volkmar Frinken, Kaspar Riesen, and Horst Bunke. 2013. A fast matching algorithm for graph-based handwriting recognition. In International Workshop on Graph-Based Representations in Pattern Recognition. Springer, 194–203.
- Thomas Gärtner, Peter Flach, and Stefan Wrobel. 2003. On graph kernels: Hardness results and efficient alternatives. In COLT. Springer, 129–143.
- Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In SIGKDD. ACM, 855–864.
- Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In NIPS. 1024–1034.
- William L Hamilton, Rex Ying, and Jure Leskovec. 2017. Representation learning on graphs: Methods and applications. Data Engineering Bulletin (2017).
- Hua He and Jimmy Lin. 2016. Pairwise word interaction modeling with deep neural networks for semantic similarity measurement. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 937–948.
- Tamás Horváth, Thomas Gärtner, and Stefan Wrobel. 2004. Cyclic pattern kernels for predictive graph mining. In SIGKDD. ACM, 158–167.
- Baotian Hu, Zhengdong Lu, Hang Li, and Qingcai Chen. 2014. Convolutional neural network architectures for matching natural language sentences. In NIPS. 2042–2050.
- Roy Jonker and Anton Volgenant. 1987. A shortest augmenting path algorithm for dense and sparse linear assignment problems. Computing 38, 4 (1987), 325–340.
- Steven Kearnes, Kevin McCloskey, Marc Berndl, Vijay Pande, and Patrick Riley. 2016. Molecular graph convolutions: moving beyond fingerprints. Journal of computer-aided molecular design 30, 8 (2016), 595–608.
- Maurice G Kendall. 1938. A new measure of rank correlation. Biometrika 30, 1/2 (1938), 81–93.
- Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. ICLR (2015).
- Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. ICLR (2016).
- Harold W Kuhn. 1955. The Hungarian method for the assignment problem. Naval research logistics quarterly 2, 1-2 (1955), 83–97.
- John Boaz Lee, Ryan Rossi, and Xiangnan Kong. 2018. Graph Classification using Structural Attention. In SIGKDD. ACM, 1666–1674.
- Vladimir I Levenshtein. 1966. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady, Vol. 10. 707–710.
- Yongjiang Liang and Peixiang Zhao. 2017. Similarity search in graph databases: A multi-layered indexing approach. In ICDE. IEEE, 783–794.
- Tengfei Ma, Cao Xiao, Jiayu Zhou, and Fei Wang. 2018. Drug Similarity Integration Through Attentive Multi-view Graph Auto-Encoders. IJCAI (2018).
- Michel Neuhaus, Kaspar Riesen, and Horst Bunke. 2006. Fast suboptimal algorithms for the computation of graph edit distance. In Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR). Springer, 163–172.
- Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. 2016. Learning convolutional neural networks for graphs. In ICML. 2014–2023.
- Giannis Nikolentzos, Polykarpos Meladianos, and Michalis Vazirgiannis. 2017. Matching Node Embeddings for Graph Similarity. In AAAI. 2429–2435.
- Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In SIGKDD. ACM, 701–710.
- Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, and Jie Tang. 2018. Network embedding as matrix factorization: Unifying deepwalk, line, pte, and node2vec. In WSDM. ACM, 459–467.
- Rashid Jalal Qureshi, Jean-Yves Ramel, and Hubert Cardot. 2007. Graph based shapes representation and recognition. In International Workshop on Graph-Based Representations in Pattern Recognition. Springer, 49–60.
- Kaspar Riesen and Horst Bunke. 2008. IAM graph database repository for graph based pattern recognition and machine learning. In Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR). Springer, 287–297.
- Kaspar Riesen and Horst Bunke. 2009. Approximate graph edit distance computation by means of bipartite graph matching. Image and Vision computing 27, 7 (2009), 950–959.
- Kaspar Riesen, Sandro Emmenegger, and Horst Bunke. 2013. A novel software toolkit for graph edit distance computation. In International Workshop on GraphBased Representations in Pattern Recognition. Springer, 142–151.
- Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne van den Berg, Ivan Titov, and Max Welling. 2018. Modeling relational data with graph convolutional networks. In ESWC. Springer, 593–607.
- Richard Socher, Danqi Chen, Christopher D Manning, and Andrew Ng. 2013. Reasoning with neural tensor networks for knowledge base completion. In NIPS. 926–934.
- Charles Spearman. 1904. The proof and measurement of association between two things. The American journal of psychology 15, 1 (1904), 72–101.
- Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. In WWW. International World Wide Web Conferences Steering Committee, 1067–1077.
- Kiran K Thekumparampil, Chong Wang, Sewoong Oh, and Li-Jia Li. 2018. Attention-based Graph Neural Network for Semi-supervised Learning. ICLR (2018).
- Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2018. Graph attention networks. ICLR (2018).
- Daixin Wang, Peng Cui, and Wenwu Zhu. 2016. Structural deep network embedding. In SIGKDD. ACM, 1225–1234.
- Xiaoli Wang, Xiaofeng Ding, Anthony KH Tung, Shanshan Ying, and Hai Jin. 2012. An efficient graph indexing method. In ICDE. IEEE, 210–221.
- Bing Xiao, Xinbo Gao, Dacheng Tao, and Xuelong Li. 2008. HMM-based graph edit distance for image indexing. International Journal of Imaging Systems and Technology 18, 2-3 (2008), 209–218.
- Pinar Yanardag and SVN Vishwanathan. 2015. Deep graph kernels. In SIGKDD. ACM, 1365–1374.
- Rex Ying, Jiaxuan You, Christopher Morris, Xiang Ren, William L Hamilton, and Jure Leskovec. 2018. Hierarchical Graph Representation Learning with Differentiable Pooling. arXiv preprint arXiv:1806.08804 (2018).
- Zhiping Zeng, Anthony KH Tung, Jianyong Wang, Jianhua Feng, and Lizhu Zhou. 2009. Comparing stars: On approximating graph edit distance. PVLDB 2, 1 (2009), 25–36.
- Xiang Zhao, Chuan Xiao, Xuemin Lin, Qing Liu, and Wenjie Zhang. 2013. A partition-based approach to structure similarity search. PVLDB 7, 3 (2013), 169– 180.
- Xiaohan Zhao, Bo Zong, Ziyu Guan, Kai Zhang, and Wei Zhao. 2018. Substructure Assembling Network for Graph Classification. AAAI (2018).
- Weiguo Zheng, Lei Zou, Xiang Lian, Dong Wang, and Dongyan Zhao. 2013. Graph similarity search with edit distance constraint in large graph databases. In CIKM. ACM, 1595–1600.

Full Text

Tags

Comments