SimGNN - A Neural Network Approach to Fast Graph Similarity Computation

WSDM, pp. 384-392, 2019.

Cited by: 36|Bibtex|Views367|DOI:https://doi.org/10.1145/3289600.3290967
EI
Other Links: dblp.uni-trier.de|dl.acm.org|academic.microsoft.com|arxiv.org
Weibo:
We introduce our proposed approach SimGNN in detail, which is an end-to-end neural network based approach that attempts to learn a function to map a pair of graphs into a similarity score

Abstract:

Graph similarity search is among the most important graph-based applications, e.g. finding the chemical compounds that are most similar to a query compound. Graph similarity/distance computation, such as Graph Edit Distance (GED) and Maximum Common Subgraph (MCS), is the core operation of graph similarity search and many other application...More

Code:

Data:

0
Introduction
  • Graphs are ubiquitous nowadays and have a wide range of applications in bioinformatics, chemistry, recommender systems, social network study, program static analysis, etc.
  • The fundamental problem of the exponential time complexity of exact graph similarity computation [28] remains.
  • Instead of calculating the exact similarity metric, these methods find approximate values in a fast and heuristic way [2, 6, 9, 28, 35]
  • These methods usually require rather complicated design and implementation based on discrete optimization or combinatorial search.
  • The time complexity is usually still polynomial or even sub-exponential in the number of nodes in the graphs, such as A*-Beamsearch (Beam) [28], Hungarian [35], VJ [9], etc
Highlights
  • Graphs are ubiquitous nowadays and have a wide range of applications in bioinformatics, chemistry, recommender systems, social network study, program static analysis, etc
  • Given the huge importance yet great difficulty of computing the exact graph distances, there have been two broad categories of methods to address the problem of graph similarity search
  • The first category of remedies is the pruning-verification framework [26, 48, 49], under which the total amount of exact graph similarity computations for a query can be reduced to a tractable degree, via a series of database indexing techniques and pruning strategies
  • We propose a novel attention mechanism to select the important nodes out of an entire graph with respect to a specific similarity metric
  • We introduce our proposed approach SimGNN in detail, which is an end-to-end neural network based approach that attempts to learn a function to map a pair of graphs into a similarity score
  • Our model runs very fast compared to existing classic algorithms on approximate Graph Edit Distance computation, and achieves very competitive accuracy
Methods
  • SimGNN mse(10−3 ) 1.189 ρ 0.843 τ 0.690 p@10 0.421 p@20 1.509 ρ 0.939 τ
Results
  • Evaluation Metrics garian

    Algorithm for bipartite graph matching, and the algorithm

    The following metrics are used to evaluate all the models: Time.
  • (1) SimpleMean takes squared error measures the average squared difference between the the unweighted average of all the node embeddings of a graph computed similarities and the ground-truth similarities.
  • Spearman’s Rank Correlation Coefficient (ρ) [39] and Kendall’s on graph coarsening, which use the global mean or max pooling.
  • Rank Correlation Coefficient (τ ) [20] measure how well the predicted to generate a graph hierarchy.
  • Compared the natural log of the degree of a node as its attention weight, as with p@k, ρ and τ evaluates the global ranking results instead of described in Section 3.1.2. (5) AttGlobalContext and (6) AttLearnfocusing on the top k results
Conclusion
  • The authors are at the intersection of graph deep learning and graph search problem, and taking the first step towards bridging the gap, by tackling the core operation of graph similarity computation , via a novel neural network based approach.
  • The central idea is to learn a neural network based function that is representation-invariant, inductive, and adaptive to the specific similarity metric, which takes any two graphs as input and outputs their similarity score.
  • The authors' model runs very fast compared to existing classic algorithms on approximate Graph Edit Distance computation, and achieves very competitive accuracy
Summary
  • Introduction:

    Graphs are ubiquitous nowadays and have a wide range of applications in bioinformatics, chemistry, recommender systems, social network study, program static analysis, etc.
  • The fundamental problem of the exponential time complexity of exact graph similarity computation [28] remains.
  • Instead of calculating the exact similarity metric, these methods find approximate values in a fast and heuristic way [2, 6, 9, 28, 35]
  • These methods usually require rather complicated design and implementation based on discrete optimization or combinatorial search.
  • The time complexity is usually still polynomial or even sub-exponential in the number of nodes in the graphs, such as A*-Beamsearch (Beam) [28], Hungarian [35], VJ [9], etc
  • Methods:

    SimGNN mse(10−3 ) 1.189 ρ 0.843 τ 0.690 p@10 0.421 p@20 1.509 ρ 0.939 τ
  • Results:

    Evaluation Metrics garian

    Algorithm for bipartite graph matching, and the algorithm

    The following metrics are used to evaluate all the models: Time.
  • (1) SimpleMean takes squared error measures the average squared difference between the the unweighted average of all the node embeddings of a graph computed similarities and the ground-truth similarities.
  • Spearman’s Rank Correlation Coefficient (ρ) [39] and Kendall’s on graph coarsening, which use the global mean or max pooling.
  • Rank Correlation Coefficient (τ ) [20] measure how well the predicted to generate a graph hierarchy.
  • Compared the natural log of the degree of a node as its attention weight, as with p@k, ρ and τ evaluates the global ranking results instead of described in Section 3.1.2. (5) AttGlobalContext and (6) AttLearnfocusing on the top k results
  • Conclusion:

    The authors are at the intersection of graph deep learning and graph search problem, and taking the first step towards bridging the gap, by tackling the core operation of graph similarity computation , via a novel neural network based approach.
  • The central idea is to learn a neural network based function that is representation-invariant, inductive, and adaptive to the specific similarity metric, which takes any two graphs as input and outputs their similarity score.
  • The authors' model runs very fast compared to existing classic algorithms on approximate Graph Edit Distance computation, and achieves very competitive accuracy
Tables
  • Table1: Statistics of datasets
  • Table2: Results on AIDS
  • Table3: Results on LINUX
  • Table4: Results on IMDB. Beam, Hungarian, and VJ together are used to determine the ground-truth results
Download tables as Excel
Related work
  • 5.1 Network/Graph Embedding

    Node-level embedding. Over the years, there are several categories of methods that have been proposed for learning node representations, including matrix factorization based methods (NetMF [32]), skip-gram based methods (DeepWalk [31], Node2Vec [12], LINE [40]), There are several directions to go for the future work: (1) our model can handle graphs with node types but cannot process edge features. In chemistry, bonds of a chemical compound are usually labeled, so it is useful to incorporate edge labels into our model; (2) it is promising to explore different techniques to further boost the precisions at the top k results, which is not preserved well mainly due to the skewed similarity distribution in the training dataset; and (3) given the constraint that the exact GEDs for large graphs cannot be computed, it would be interesting to see how the learned model generalize to large graphs, which is trained only on the exact GEDs between small graphs.
Funding
  • The work is supported in part by NSF DBI 1565137, NSF DGE1829071, NSF III-1705169, NSF CAREER Award 1741634, NIH U01HG008488, NIH R01GM115833, Snapchat gift funds, and PPDai gift fund
Reference
  • David B Blumenthal and Johann Gamper. 2018. On the exact computation of the graph edit distance. Pattern Recognition Letters (2018).
    Google ScholarLocate open access versionFindings
  • Sebastien Bougleux, Luc Brun, Vincenzo Carletti, Pasquale Foggia, Benoit Gaüzère, and Mario Vento. 2017. Graph edit distance as a quadratic assignment problem. Pattern Recognition Letters 87 (2017), 38–46.
    Google ScholarLocate open access versionFindings
  • H Bunke. 198What is the distance between graphs. Bulletin of the EATCS 20 (1983), 35–39.
    Google ScholarLocate open access versionFindings
  • Horst Bunke. 1997. On a relation between graph edit distance and maximum common subgraph. Pattern Recognition Letters 18, 8 (1997), 689–694.
    Google ScholarLocate open access versionFindings
  • Horst Bunke and Kim Shearer. 1998. A graph distance metric based on the maximal common subgraph. Pattern recognition letters 19, 3-4 (1998), 255–259.
    Google ScholarLocate open access versionFindings
  • Évariste Daller, Sébastien Bougleux, Benoit Gaüzère, and Luc Brun. 2018. Approximate graph edit distance by several local searches in parallel. In ICPRAM.
    Google ScholarLocate open access versionFindings
  • Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. In NIPS. 3844–3852.
    Google ScholarFindings
  • David K Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Alán Aspuru-Guzik, and Ryan P Adams. 2015. Convolutional networks on graphs for learning molecular fingerprints. In NIPS. 2224–2232.
    Google ScholarFindings
  • Stefan Fankhauser, Kaspar Riesen, and Horst Bunke. 2011. Speeding up graph edit distance computation through fast bipartite matching. In International Workshop on Graph-Based Representations in Pattern Recognition. Springer, 102–111.
    Google ScholarLocate open access versionFindings
  • Andreas Fischer, Ching Y Suen, Volkmar Frinken, Kaspar Riesen, and Horst Bunke. 2013. A fast matching algorithm for graph-based handwriting recognition. In International Workshop on Graph-Based Representations in Pattern Recognition. Springer, 194–203.
    Google ScholarLocate open access versionFindings
  • Thomas Gärtner, Peter Flach, and Stefan Wrobel. 2003. On graph kernels: Hardness results and efficient alternatives. In COLT. Springer, 129–143.
    Google ScholarFindings
  • Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In SIGKDD. ACM, 855–864.
    Google ScholarLocate open access versionFindings
  • Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In NIPS. 1024–1034.
    Google ScholarFindings
  • William L Hamilton, Rex Ying, and Jure Leskovec. 2017. Representation learning on graphs: Methods and applications. Data Engineering Bulletin (2017).
    Google ScholarLocate open access versionFindings
  • Hua He and Jimmy Lin. 2016. Pairwise word interaction modeling with deep neural networks for semantic similarity measurement. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 937–948.
    Google ScholarLocate open access versionFindings
  • Tamás Horváth, Thomas Gärtner, and Stefan Wrobel. 2004. Cyclic pattern kernels for predictive graph mining. In SIGKDD. ACM, 158–167.
    Google ScholarLocate open access versionFindings
  • Baotian Hu, Zhengdong Lu, Hang Li, and Qingcai Chen. 2014. Convolutional neural network architectures for matching natural language sentences. In NIPS. 2042–2050.
    Google ScholarFindings
  • Roy Jonker and Anton Volgenant. 1987. A shortest augmenting path algorithm for dense and sparse linear assignment problems. Computing 38, 4 (1987), 325–340.
    Google ScholarLocate open access versionFindings
  • Steven Kearnes, Kevin McCloskey, Marc Berndl, Vijay Pande, and Patrick Riley. 2016. Molecular graph convolutions: moving beyond fingerprints. Journal of computer-aided molecular design 30, 8 (2016), 595–608.
    Google ScholarLocate open access versionFindings
  • Maurice G Kendall. 1938. A new measure of rank correlation. Biometrika 30, 1/2 (1938), 81–93.
    Google ScholarLocate open access versionFindings
  • Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. ICLR (2015).
    Google ScholarLocate open access versionFindings
  • Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. ICLR (2016).
    Google ScholarLocate open access versionFindings
  • Harold W Kuhn. 1955. The Hungarian method for the assignment problem. Naval research logistics quarterly 2, 1-2 (1955), 83–97.
    Google ScholarLocate open access versionFindings
  • John Boaz Lee, Ryan Rossi, and Xiangnan Kong. 2018. Graph Classification using Structural Attention. In SIGKDD. ACM, 1666–1674.
    Google ScholarLocate open access versionFindings
  • Vladimir I Levenshtein. 1966. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady, Vol. 10. 707–710.
    Google ScholarLocate open access versionFindings
  • Yongjiang Liang and Peixiang Zhao. 2017. Similarity search in graph databases: A multi-layered indexing approach. In ICDE. IEEE, 783–794.
    Google ScholarLocate open access versionFindings
  • Tengfei Ma, Cao Xiao, Jiayu Zhou, and Fei Wang. 2018. Drug Similarity Integration Through Attentive Multi-view Graph Auto-Encoders. IJCAI (2018).
    Google ScholarLocate open access versionFindings
  • Michel Neuhaus, Kaspar Riesen, and Horst Bunke. 2006. Fast suboptimal algorithms for the computation of graph edit distance. In Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR). Springer, 163–172.
    Google ScholarLocate open access versionFindings
  • Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. 2016. Learning convolutional neural networks for graphs. In ICML. 2014–2023.
    Google ScholarFindings
  • Giannis Nikolentzos, Polykarpos Meladianos, and Michalis Vazirgiannis. 2017. Matching Node Embeddings for Graph Similarity. In AAAI. 2429–2435.
    Google ScholarFindings
  • Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In SIGKDD. ACM, 701–710.
    Google ScholarLocate open access versionFindings
  • Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, and Jie Tang. 2018. Network embedding as matrix factorization: Unifying deepwalk, line, pte, and node2vec. In WSDM. ACM, 459–467.
    Google ScholarLocate open access versionFindings
  • Rashid Jalal Qureshi, Jean-Yves Ramel, and Hubert Cardot. 2007. Graph based shapes representation and recognition. In International Workshop on Graph-Based Representations in Pattern Recognition. Springer, 49–60.
    Google ScholarLocate open access versionFindings
  • Kaspar Riesen and Horst Bunke. 2008. IAM graph database repository for graph based pattern recognition and machine learning. In Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR). Springer, 287–297.
    Google ScholarLocate open access versionFindings
  • Kaspar Riesen and Horst Bunke. 2009. Approximate graph edit distance computation by means of bipartite graph matching. Image and Vision computing 27, 7 (2009), 950–959.
    Google ScholarLocate open access versionFindings
  • Kaspar Riesen, Sandro Emmenegger, and Horst Bunke. 2013. A novel software toolkit for graph edit distance computation. In International Workshop on GraphBased Representations in Pattern Recognition. Springer, 142–151.
    Google ScholarLocate open access versionFindings
  • Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne van den Berg, Ivan Titov, and Max Welling. 2018. Modeling relational data with graph convolutional networks. In ESWC. Springer, 593–607.
    Google ScholarFindings
  • Richard Socher, Danqi Chen, Christopher D Manning, and Andrew Ng. 2013. Reasoning with neural tensor networks for knowledge base completion. In NIPS. 926–934.
    Google ScholarLocate open access versionFindings
  • Charles Spearman. 1904. The proof and measurement of association between two things. The American journal of psychology 15, 1 (1904), 72–101.
    Google ScholarLocate open access versionFindings
  • Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. In WWW. International World Wide Web Conferences Steering Committee, 1067–1077.
    Google ScholarLocate open access versionFindings
  • Kiran K Thekumparampil, Chong Wang, Sewoong Oh, and Li-Jia Li. 2018. Attention-based Graph Neural Network for Semi-supervised Learning. ICLR (2018).
    Google ScholarLocate open access versionFindings
  • Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2018. Graph attention networks. ICLR (2018).
    Google ScholarLocate open access versionFindings
  • Daixin Wang, Peng Cui, and Wenwu Zhu. 2016. Structural deep network embedding. In SIGKDD. ACM, 1225–1234.
    Google ScholarLocate open access versionFindings
  • Xiaoli Wang, Xiaofeng Ding, Anthony KH Tung, Shanshan Ying, and Hai Jin. 2012. An efficient graph indexing method. In ICDE. IEEE, 210–221.
    Google ScholarLocate open access versionFindings
  • Bing Xiao, Xinbo Gao, Dacheng Tao, and Xuelong Li. 2008. HMM-based graph edit distance for image indexing. International Journal of Imaging Systems and Technology 18, 2-3 (2008), 209–218.
    Google ScholarLocate open access versionFindings
  • Pinar Yanardag and SVN Vishwanathan. 2015. Deep graph kernels. In SIGKDD. ACM, 1365–1374.
    Google ScholarLocate open access versionFindings
  • Rex Ying, Jiaxuan You, Christopher Morris, Xiang Ren, William L Hamilton, and Jure Leskovec. 2018. Hierarchical Graph Representation Learning with Differentiable Pooling. arXiv preprint arXiv:1806.08804 (2018).
    Findings
  • Zhiping Zeng, Anthony KH Tung, Jianyong Wang, Jianhua Feng, and Lizhu Zhou. 2009. Comparing stars: On approximating graph edit distance. PVLDB 2, 1 (2009), 25–36.
    Google ScholarFindings
  • Xiang Zhao, Chuan Xiao, Xuemin Lin, Qing Liu, and Wenjie Zhang. 2013. A partition-based approach to structure similarity search. PVLDB 7, 3 (2013), 169– 180.
    Google ScholarLocate open access versionFindings
  • Xiaohan Zhao, Bo Zong, Ziyu Guan, Kai Zhang, and Wei Zhao. 2018. Substructure Assembling Network for Graph Classification. AAAI (2018).
    Google ScholarLocate open access versionFindings
  • Weiguo Zheng, Lei Zou, Xiang Lian, Dong Wang, and Dongyan Zhao. 2013. Graph similarity search with edit distance constraint in large graph databases. In CIKM. ACM, 1595–1600.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments