Open Graph Benchmark: Datasets for Machine Learning on Graphs

NeurIPS 2020, 2020.

Cited by: 35|Bibtex|Views321|Links
Keywords:
Graph Convolutional NetworksOpen Graph BenchmarkKnowledge Graphgraph benchmarkgraph informationMore(14+)
Weibo:
Reproducible graph Machine learning research, we introduce the Open Graph Benchmark —a diverse set of realistic graph datasets in terms of scales, domains, and task categories

Abstract:

We present the Open Graph Benchmark (OGB), a diverse set of challenging and realistic benchmark datasets to facilitate scalable, robust, and reproducible graph machine learning (ML) research. OGB datasets are large-scale, encompass multiple important graph ML tasks and cover a diverse range of domains, ranging from social and informatio...More

Code:

Data:

0
Introduction
Highlights
  • Graphs are widely used for abstracting complex systems of interacting objects, such as social networks (Easley et al, 2010), knowledge graphs (Nickel et al, 2015), molecular graphs (Wu et al, 2018), and biological networks (Barabasi & Oltvai, 2004), as well as for modeling 3D objects (Simonovsky & Komodakis, 2017), manifolds (Bronstein et al, 2017), and source code (Allamanis et al, 2017)
  • We compare datasets from diverse application domains by inspecting their basic graph statistics, e.g., node degree, clustering coefficient, and diameter. We show that they exhibit diverse graph characteristics, which is crucial to evaluate the versatility of graph Machine learning (ML) models
  • Robust, and reproducible graph ML research, we introduce the Open Graph Benchmark (OGB)—a diverse set of realistic graph datasets in terms of scales, domains, and task categories
  • Our aim is for OGB to push the frontier of graph ML research
  • Our immediate future plan is to increase the coverage in Table 1 by adding large graph datasets with over 10 million nodes, as well as heterogeneous knowledge graphs
  • The paper will be updated as more datasets are included in OGB
Methods
  • Accuracy (%) Training Validation Test MLP

    NODE2VEC GCN† GRAPHSAGE† Train Validation

    Test their full-batch counterpart, which raises a research opportunity to improve the existing mini-batch training techniques of GNNs.
  • The same trend holds true for the other MOLECULENET datasets, e.g., the best GIN performance on the random split of ogbg-moltox21 is 86.03±1.37%, which is 8.46 percentage points higher than that of the best GIN for the scaffold split (77.57±0.62% ROC-AUC)
  • These results highlight the challenges of the scaffold split compared to the random split, and opens up a fruitful research opportunity to increase the out-of-distribution generalization capability of GNNs. 6.2 ogbg-ppa: Protein-Protein Association Network.
  • The edges are associated with 7-dimensional features, where each element takes a value between 0 and 1 and represents the strength of a particular type of protein protein association such as gene co-occurrence, gene fusion events, and co-expression
Conclusion
  • The authors' initial benchmarking results in Table 4 show that the highest test performances are attained by GNN architectures, while the MLP baseline that solely relies on a product’s description is not sufficient for accurately predicting the category of a product.
  • The MLP baseline9 performs extremely poorly, which is to be expected since the node features are not rich in this dataset
  • Both GNN baselines (GCN, GRAPHSAGE) and NODE2VEC fail to overfit on the training data and show similar performances across training/validation/test splits.
  • All four models are able to achieve higher MRR on training12, validation and test sets, as seen from the bottom-half of Table 10
  • This suggests the importance of using sufficient embedding dimensionality to achieve good performance in this dataset.
  • The paper will be updated as more datasets are included in OGB
Summary
  • Introduction:

    Graphs are widely used for abstracting complex systems of interacting objects, such as social networks (Easley et al, 2010), knowledge graphs (Nickel et al, 2015), molecular graphs (Wu et al, 2018), and biological networks (Barabasi & Oltvai, 2004), as well as for modeling 3D objects (Simonovsky & Komodakis, 2017), manifolds (Bronstein et al, 2017), and source code (Allamanis et al, 2017).
  • Data duplication and leakage issues have been identified in some widely-used graph datasets (Zou et al, 2020).
  • These hinder the reliable evaluation and rigorous comparison for furthering graph ML
  • Objectives:

    The authors' aim is for OGB to push the frontier of graph ML research.
  • Methods:

    Accuracy (%) Training Validation Test MLP

    NODE2VEC GCN† GRAPHSAGE† Train Validation

    Test their full-batch counterpart, which raises a research opportunity to improve the existing mini-batch training techniques of GNNs.
  • The same trend holds true for the other MOLECULENET datasets, e.g., the best GIN performance on the random split of ogbg-moltox21 is 86.03±1.37%, which is 8.46 percentage points higher than that of the best GIN for the scaffold split (77.57±0.62% ROC-AUC)
  • These results highlight the challenges of the scaffold split compared to the random split, and opens up a fruitful research opportunity to increase the out-of-distribution generalization capability of GNNs. 6.2 ogbg-ppa: Protein-Protein Association Network.
  • The edges are associated with 7-dimensional features, where each element takes a value between 0 and 1 and represents the strength of a particular type of protein protein association such as gene co-occurrence, gene fusion events, and co-expression
  • Conclusion:

    The authors' initial benchmarking results in Table 4 show that the highest test performances are attained by GNN architectures, while the MLP baseline that solely relies on a product’s description is not sufficient for accurately predicting the category of a product.
  • The MLP baseline9 performs extremely poorly, which is to be expected since the node features are not rich in this dataset
  • Both GNN baselines (GCN, GRAPHSAGE) and NODE2VEC fail to overfit on the training data and show similar performances across training/validation/test splits.
  • All four models are able to achieve higher MRR on training12, validation and test sets, as seen from the bottom-half of Table 10
  • This suggests the importance of using sufficient embedding dimensionality to achieve good performance in this dataset.
  • The paper will be updated as more datasets are included in OGB
Tables
  • Table1: Overview of currently-available OGB datasets (denoted in green). Nature domain includes biological networks and molecular graphs, Society domain includes academic graphs and e-commerce networks, and Information domain includes knowledge graphs
  • Table2: Summary of currently-available OGB datasets. An OGB dataset, e.g., ogbg-molhiv, is identified by its prefix (ogbg-) and its name (molhiv). The prefix specifies the category of the graph ML task, i.e., node (ogbn-), link (ogbl-), or graph (ogbg-) property prediction. A realistic split scheme is adopted for each dataset, whose detail can be found in Sections 4, 5, and 6, for each dataset
  • Table3: Statistics of currently-available OGB datasets. All the directed graphs are first converted into undirected ones before the last three graph statistics are calculated using the SNAP library (<a class="ref-link" id="cLeskovec_2016_a" href="#rLeskovec_2016_a">Leskovec & Sosic, 2016</a>). The diameter is approximated by performing BFS from 1,000 randomlysampled nodes
  • Table4: Results for ogbn-products. †Requires a GPU with 48GB of memory
  • Table5: Results for ogbn-arxiv
  • Table6: Results for ogbn-proteins
  • Table7: Results for ogbl-ppa
  • Table8: Results for ogbl-collab
  • Table9: Results for ogbl-citation. †Requires a GPU with 48GB of memory. Repeated only for 5 times due to resource constraints
  • Table10: Results for ogbl-wikikg. †Requires a GPU with 48GB of memory. Repeated only once due to resource constraints
  • Table11: Results for ogbg-molhiv
  • Table12: Results for ogbg-molpcba
  • Table13: Results for ogbg-ppa
Download tables as Excel
Funding
  • Weihua Hu is supported by Funai Overseas Scholarship and Masason Foundation Fellowship
  • Matthias Fey is supported by the German Research Association (DFG) within the Collaborative Research Center SFB 876 “Providing Information by Resource-Constrained Analysis”, project A6
  • We gratefully acknowledge the support of DARPA under Nos
Study subjects and analysis
proteins from each species: 100
These results highlight the challenges of the scaffold split compared to the random split, and opens up a fruitful research opportunity to increase the out-of-distribution generalization capability of GNNs.

6.2 ogbg-ppa: Protein-Protein Association Network

The dataset ogbg-ppa is a set of undirected protein association neighborhoods extracted from the protein-protein association networks of 1,581 different species (Szklarczyk et al, 2019) that cover 37 broad taxonomic groups (e.g., mammals, bacterial families, archaeans) and span the tree of life (Hug et al, 2016). To construct the neighborhoods, we randomly selected 100 proteins from each species and constructed 2-hop protein association neighborhoods centered on each of the selected proteins (Zitnik et al, 2019). We then removed the center node from each neighborhood and subsampled the neighborhood to ensure the final protein association graph is small enough (less than 300 nodes)

graph datasets: 3
In addition to data diversity, OGB supports three categories of fundamental graph ML tasks, i.e., node, link, and graph property predictions, each of which requires the models to make predictions at different levels of graphs, i.e.at the level of a node, link, and entire graph, respectively. Currently, OGB includes at least 3 graph datasets for each task category. The currently-available OGB datasets are listed in Table 1 We will update the table as new datasets are released

datasets: 3
4 OGB Node Property Prediction. We currently provide three datasets, adopted from different application domains, for predicting the properties of individual nodes. Specifically, ogbn-products is an Amazon products co-purchasing network (Kush Bhatia, 2016) (cf

datasets: 3
Section 4.3). The three datasets show highly different graph statistics, as shown in Table 3. For example, the biological network—ogbn-proteins—is much denser than the social/information networks (ogbn-arxiv and ogbn-products) as can be observed from its large average node degree and small graph diameter

species: 8
All edges come with 8-dimensional features, where each dimension represents the strength of a single association type and takes values between 0 and 1 (the larger the value is, the stronger the association is). The proteins come from 8 species. Prediction task

datasets: 4
5 OGB Link Property Prediction. We currently provide four datasets, adopted from diverse application domains, for predicting the properties of links (pairs of nodes). Specifically, ogbl-ppa is a protein-protein association network (Szklarczyk et al, 2019) (cf

datasets: 3
6 OGB Graph Property Prediction. We currently provide three datasets, adopted from two distinct application domains, for predicting the properties of entire graphs or subgraphs. Specifically, ogbg-molhiv and ogbg-molpcba are molecular graphs (Wu et al, 2018) (cf

proteins from each species: 100
The dataset ogbg-ppa is a set of undirected protein association neighborhoods extracted from the protein-protein association networks of 1,581 different species (Szklarczyk et al, 2019) that cover 37 broad taxonomic groups (e.g., mammals, bacterial families, archaeans) and span the tree of life (Hug et al, 2016). To construct the neighborhoods, we randomly selected 100 proteins from each species and constructed 2-hop protein association neighborhoods centered on each of the selected proteins (Zitnik et al, 2019). We then removed the center node from each neighborhood and subsampled the neighborhood to ensure the final protein association graph is small enough (less than 300 nodes)

Reference
  • Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. Tensorflow: A system for large-scale machine learning. In Symposium on Operating Systems Design and Implementation OSDI), pp. 265–283, 2016.
    Google ScholarLocate open access versionFindings
  • Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. Learning to represent programs with graphs. arXiv preprint arXiv:1711.00740, 2017.
    Findings
  • Jürgen Bajorath. Integration of virtual and high-throughput screening. Nature Reviews Drug Discovery, 1(11):882–894, 2002.
    Google ScholarLocate open access versionFindings
  • Albert-Laszlo Barabasi and Zoltan N Oltvai. Network biology: understanding the cell’s functional organization. Nature reviews genetics, 5(2):101–113, 2004.
    Google ScholarLocate open access versionFindings
  • Jon Barker, Ricard Marxer, Emmanuel Vincent, and Shinji Watanabe. The third ‘chime’speech separation and recognition challenge: Dataset, task and baselines. In 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 504–511. IEEE, 2015.
    Google ScholarLocate open access versionFindings
  • Rianne van den Berg, Thomas N. Kipf, and Max Welling. Graph convolutional matrix completion. arXiv preprint arXiv:1706.02263, 2017.
    Findings
  • Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. Translating embeddings for modeling multi-relational data. In Advances in Neural Information Processing Systems (NIPS), pp. 2787–2795, 2013.
    Google ScholarLocate open access versionFindings
  • Michael M Bronstein, Joan Bruna, Yann LeCun, Arthur Szlam, and Pierre Vandergheynst. Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine, 34(4):18–42, 2017.
    Google ScholarLocate open access versionFindings
  • Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. In NeurIPS workshop on Machine Learning Systems, 2015.
    Google ScholarLocate open access versionFindings
  • Wei-Lin Chiang, Xuanqing Liu, Si Si, Yang Li, Samy Bengio, and Cho-Jui Hsieh. Cluster-GCN: An efficient algorithm for training deep and large graph convolutional networks. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pp. 257–266, 2019.
    Google ScholarLocate open access versionFindings
  • Gene Ontology Consortium. The gene ontology resource: 20 years and still going strong. Nucleic acids research, 47(D1):D330–D338, 2018.
    Google ScholarLocate open access versionFindings
  • Lenore Cowen, Trey Ideker, Benjamin J Raphael, and Roded Sharan. Network propagation: a universal amplifier of genetic associations. Nature Reviews Genetics, 18(9):551, 2017.
    Google ScholarLocate open access versionFindings
  • David De Juan, Florencio Pazos, and Alfonso Valencia. Emerging methods in protein co-evolution. Nature Reviews Genetics, 14(4):249–261, 2013.
    Google ScholarLocate open access versionFindings
  • Yuxiao Dong, Hao Ma, Zhihong Shen, and Kuansan Wang. A century of science: Globalization of scientific collaborations, citations, and innovations. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pp. 1437–1446. ACM, 2017.
    Google ScholarLocate open access versionFindings
  • David K Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Alán Aspuru-Guzik, and Ryan P Adams. Convolutional networks on graphs for learning molecular fingerprints. In Advances in Neural Information Processing Systems (NIPS), pp. 2224–2232, 2015.
    Google ScholarLocate open access versionFindings
  • Vijay Prakash Dwivedi, Chaitanya K Joshi, Thomas Laurent, Yoshua Bengio, and Xavier Bresson. Benchmarking graph neural networks. arXiv preprint arXiv:2003.00982, 2020.
    Findings
  • David Easley, Jon Kleinberg, et al. Networks, crowds, and markets, volume 8. Cambridge university press Cambridge, 2010.
    Google ScholarFindings
  • Federico Errica, Marco Podda, Davide Bacciu, and Alessio Micheli. A fair comparison of graph neural networks for graph classification. arXiv preprint arXiv:1912.09893, 2019.
    Findings
  • M. Fey and J. E. Lenssen. Fast graph representation learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019.
    Google ScholarLocate open access versionFindings
  • Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural message passing for quantum chemistry. In International Conference on Machine Learning (ICML), pp. 1273–1272, 2017.
    Google ScholarLocate open access versionFindings
  • Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pp. 855–864. ACM, 2016.
    Google ScholarLocate open access versionFindings
  • William L Hamilton, Rex Ying, and Jure Leskovec. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems (NIPS), pp. 1025–1035, 2017a.
    Google ScholarLocate open access versionFindings
  • William L Hamilton, Rex Ying, and Jure Leskovec. Representation learning on graphs: Methods and applications. IEEE Data Engineering Bulletin, 40(3):52–74, 2017b.
    Google ScholarLocate open access versionFindings
  • Weihua Hu, Bowen Liu, Joseph Gomes, Marinka Zitnik, Percy Liang, Vijay Pande, and Jure Leskovec. Strategies for pre-training graph neural networks. In International Conference on Learning Representations (ICLR), 2020a.
    Google ScholarLocate open access versionFindings
  • Ziniu Hu, Yuxiao Dong, Kuansan Wang, and Yizhou Sun. Heterogeneous graph transformer. In Proceedings of the International World Wide Web Conference (WWW), pp. n/a, 2020b.
    Google ScholarLocate open access versionFindings
  • Laura A Hug, Brett J Baker, Karthik Anantharaman, Christopher T Brown, Alexander J Probst, Cindy J Castelle, Cristina N Butterfield, Alex W Hernsdorf, Yuki Amano, Kotaro Ise, et al. A new view of the tree of life. Nature Microbiology, 1(5):16048, 2016.
    Google ScholarLocate open access versionFindings
  • Katsuhiko Ishiguro, Shin-ichi Maeda, and Masanori Koyama. Graph warp module: An auxiliary module for boosting the power of graph neural networks. arXiv preprint arXiv:1902.01020, 2019.
    Findings
  • S. Ivanov, S. Sviridov, and E. Burnaev. Understanding isomorphism bias in graph data sets. arXiv preprint arXiv:1910.12091, 2019.
    Findings
  • Wengong Jin, Regina Barzilay, and Tommi Jaakkola. Hierarchical generation of molecular graphs using structural motifs. arXiv preprint arXiv:2002.03230, 2020.
    Findings
  • Kristian Kersting, Nils M Kriege, Christopher Morris, Petra Mutzel, and Marion Neumann. Benchmark data sets for graph kernels, 2016. URL http://graphkernels.cs.tu-dortmund.de, 2016.
    Findings
  • Thomas N. Kipf and Max Welling. Variational graph auto-encoders. arXiv preprint arXiv:1611.07308, 2016.
    Findings
  • Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR), 2017.
    Google ScholarLocate open access versionFindings
  • Himanshu Jain Anshul Mittal Yashoteja Prabhu Manik Varma Kush Bhatia, Kunal Dahiya. The extreme classification repository: Multi-label datasets and code, 2016. URL http://manikvarma.org/downloads/XC/XMLRepository.html.
    Findings
  • Greg Landrum et al. Rdkit: Open-source cheminformatics, 2006.
    Google ScholarLocate open access versionFindings
  • Adam Lerer, Ledell Wu, Jiajun Shen, Timothee Lacroix, Luca Wehrstedt, Abhijit Bose, and Alex Peysakhovich. Pytorch-biggraph: A large-scale graph embedding system. arXiv preprint arXiv:1903.12287, 2019.
    Findings
  • Jure Leskovec and Rok Sosic. Snap: A general-purpose network analysis and graph-mining library. ACM Transactions on Intelligent Systems and Technology (TIST), 8(1):1–20, 2016.
    Google ScholarLocate open access versionFindings
  • Junying Li, Deng Cai, and Xiaofei He. Learning graph-level representation for drug discovery. arXiv preprint arXiv:1709.03741, 2017.
    Findings
  • David Liben-Nowell and Jon M. Kleinberg. The link-prediction problem for social networks. Journal of the Association for Information Science and Technology, 58(7):1019–1031, 2007.
    Google ScholarLocate open access versionFindings
  • Sharon L Lohr. Sampling: design and analysis. Nelson Education, 2009.
    Google ScholarLocate open access versionFindings
  • Ricardo Macarron, Martyn N Banks, Dejan Bojanic, David J Burns, Dragan A Cirovic, Tina Garyantes, Darren VS Green, Robert P Hertzberg, William P Janzen, Jeff W Paslay, et al. Impact of high-throughput screening in biomedical research. Nature Reviews Drug discovery, 10(3):188–195, 2011.
    Google ScholarLocate open access versionFindings
  • Noël Malod-Dognin, Kristina Ban, and Nataša Pržulj. Unified alignment of protein-protein interaction networks. Scientific Reports, 7(1):1–11, 2017.
    Google ScholarLocate open access versionFindings
  • Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems (NIPS), pp. 3111–3119, 2013.
    Google ScholarLocate open access versionFindings
  • Maximilian Nickel, Kevin Murphy, Volker Tresp, and Evgeniy Gabrilovich. A review of relational machine learning for knowledge graphs. Proceedings of the IEEE, 104(1):11–33, 2015.
    Google ScholarLocate open access versionFindings
  • Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur. Librispeech: an asr corpus based on public domain audio books. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5206–5210. IEEE, 2015.
    Google ScholarLocate open access versionFindings
  • Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems (NIPS), pp. 8024–8035, 2019.
    Google ScholarLocate open access versionFindings
  • Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. Deepwalk: Online learning of social representations. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pp. 701–710. ACM, 2014.
    Google ScholarLocate open access versionFindings
  • Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Chi Wang, Kuansan Wang, and Jie Tang. Netsmf: Large-scale network embedding as sparse matrix factorization. In Proceedings of the International World Wide Web Conference (WWW), pp. 1509–1520, 2019.
    Google ScholarLocate open access versionFindings
  • Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250, 2016.
    Findings
  • Roded Sharan, Silpa Suthram, Ryan M Kelley, Tanja Kuhn, Scott McCuine, Peter Uetz, Taylor Sittler, Richard M Karp, and Trey Ideker. Conserved patterns of protein interaction in multiple species. Proceedings of the National Academy of Sciences, 102(6):1974–1979, 2005.
    Google ScholarLocate open access versionFindings
  • Oleksandr Shchur, Maximilian Mumme, Aleksandar Bojchevski, and Stephan Günnemann. Pitfalls of graph neural network evaluation. arXiv preprint arXiv:1811.05868, 2018.
    Findings
  • Martin Simonovsky and Nikos Komodakis. Dynamic edge-conditioned filters in convolutional neural networks on graphs. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3693–3702, 2017.
    Google ScholarLocate open access versionFindings
  • Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, and Jian Tang. Rotate: Knowledge graph embedding by relational rotation in complex space. In International Conference on Learning Representations (ICLR), 2019.
    Google ScholarLocate open access versionFindings
  • Damian Szklarczyk, Annika L Gable, David Lyon, Alexander Junge, Stefan Wyder, Jaime HuertaCepas, Milan Simonovic, Nadezhda T Doncheva, John H Morris, Peer Bork, et al. STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Research, 47(D1):D607–D613, 2019.
    Google ScholarLocate open access versionFindings
  • Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, and Guillaume Bouchard. Complex embeddings for simple link prediction. In International Conference on Machine Learning (ICML), pp. 2071–2080, 2016.
    Google ScholarLocate open access versionFindings
  • Petar Velickovic, William Fedus, William L Hamilton, Pietro Liò, Yoshua Bengio, and R Devon Hjelm. Deep graph infomax. In International Conference on Learning Representations (ICLR), 2019.
    Google ScholarLocate open access versionFindings
  • Denny Vrandecicand Markus Krötzsch. Wikidata: a free collaborative knowledgebase. Communications of the ACM, 57(10):78–85, 2014.
    Google ScholarLocate open access versionFindings
  • Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R Bowman. Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461, 2018.
    Findings
  • Kuansan Wang, Zhihong Shen, Chiyuan Huang, Chieh-Han Wu, Yuxiao Dong, and Anshul Kanakia. Microsoft academic graph: When experts are not enough. Quantitative Science Studies, 1(1): 396–413, 2020.
    Google ScholarLocate open access versionFindings
  • Minjie Wang, Lingfan Yu, Da Zheng, Quan Gan, Yu Gai, Zihao Ye, Mufei Li, Jinjing Zhou, Qi Huang, Chao Ma, Ziyue Huang, Qipeng Guo, Hao Zhang, Haibin Lin, Junbo Zhao, Jinyang Li, Alexander J Smola, and Zheng Zhang. Deep graph library: Towards efficient and scalable deep learning on graphs. ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019a. URL https://arxiv.org/abs/1909.01315.
    Findings
  • Xiaozhi Wang, Tianyu Gao, Zhaocheng Zhu, Zhiyuan Liu, Juanzi Li, and Jian Tang. Kepler: A unified model for knowledge embedding and pre-trained language representation. arXiv preprint arXiv:1911.06136, 2019b.
    Findings
  • Zhenqin Wu, Bharath Ramsundar, Evan N Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S Pappu, Karl Leswing, and Vijay Pande. Moleculenet: a benchmark for molecular machine learning. Chemical science, 9(2):513–530, 2018.
    Google ScholarLocate open access versionFindings
  • Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks? In International Conference on Learning Representations (ICLR), 2019.
    Google ScholarLocate open access versionFindings
  • Pinar Yanardag and SVN Vishwanathan. Deep graph kernels. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pp. 1365–1374. ACM, 2015.
    Google ScholarLocate open access versionFindings
  • Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. Embedding entities and relations for learning and inference in knowledge bases. In International Conference on Learning Representations (ICLR), 2015.
    Google ScholarLocate open access versionFindings
  • Kevin Yang, Kyle Swanson, Wengong Jin, Connor Coley, Philipp Eiden, Hua Gao, Angel GuzmanPerez, Timothy Hopper, Brian Kelley, Miriam Mathea, et al. Analyzing learned molecular representations for property prediction. Journal of chemical information and modeling, 59(8): 3370–3388, 2019.
    Google ScholarLocate open access versionFindings
  • Zhilin Yang, William W Cohen, and Ruslan Salakhutdinov. Revisiting semi-supervised learning with graph embeddings. In International Conference on Machine Learning (ICML), pp. 40–48, 2016.
    Google ScholarLocate open access versionFindings
  • Rex Ying, Jiaxuan You, Christopher Morris, Xiang Ren, William L Hamilton, and Jure Leskovec. Hierarchical graph representation learning with differentiable pooling. In Advances in Neural Information Processing Systems (NIPS), 2018.
    Google ScholarLocate open access versionFindings
  • Jiaxuan You, Rex Ying, and Jure Leskovec. Position-aware graph neural networks. In International Conference on Machine Learning (ICML), 2019.
    Google ScholarLocate open access versionFindings
  • David Younger, Stephanie Berger, David Baker, and Eric Klavins. High-throughput characterization of protein–protein interactions by reprogramming yeast mating. Proceedings of the National Academy of Sciences, 114(46):12166–12171, 2017.
    Google ScholarLocate open access versionFindings
  • Hanqing Zeng, Hongkuan Zhou, Ajitesh Srivastava, Rajgopal Kannan, and Viktor Prasanna. GraphSaint: Graph sampling based inductive learning method. In International Conference on Learning Representations (ICLR), 2020.
    Google ScholarLocate open access versionFindings
  • Muhan Zhang and Yixin Chen. Link prediction based on graph neural networks. In Advances in Neural Information Processing Systems (NIPS), pp. 5165–5175, 2018.
    Google ScholarLocate open access versionFindings
  • Da Zheng, Xiang Song, Chao Ma, Zeyuan Tan, Zihao Ye, Jin Dong, Hao Xiong, Zheng Zhang, and George Karypis. Dgl-ke: Training knowledge graph embeddings at scale. arXiv preprint arXiv:2004.08532, 2020.
    Findings
  • Zhaocheng Zhu, Shizhen Xu, Jian Tang, and Meng Qu. Graphvite: A high-performance cpugpu hybrid system for node embedding. In Proceedings of the International World Wide Web Conference (WWW), pp. 2494–2504, 2019.
    Google ScholarLocate open access versionFindings
  • Marinka Zitnik, Marcus W Feldman, Jure Leskovec, et al. Evolution of resilience in protein interactomes across the tree of life. Proceedings of the National Academy of Sciences, 116(10): 4426–4433, 2019.
    Google ScholarLocate open access versionFindings
  • Xu Zou, Qiuye Jia, Jianwei Zhang, Chang Zhou, Zijun Yao, Hongxia Yang, and Jie Tang. Dimensional reweighting graph convolution networks, 2020. URL https://openreview.net/forum?id=SJeLO34KwS.
    Findings
Your rating :
0

 

Tags
Comments