## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# Contrastive Multi-View Representation Learning on Graphs

ICML 2020, (2020)

Full Text

Weibo

Keywords

Abstract

We introduce a self-supervised approach for learning node and graph level representations by contrasting structural views of graphs. We show that unlike visual representation learning, increasing the number of views to more than two or contrasting multi-scale encodings do not improve performance, and the best performance is achieved by co...More

Code:

Data:

Introduction

- Graph neural networks (GNN) (Li et al, 2015; Gilmer et al, 2017; Kipf & Welling, 2017; Velickovicet al., 2018; Xu et al, 2019b) reconcile the expressive power of graphs in modeling interactions with unparalleled capacity of deep models in learning representations.
- Recent works on contrastive learning by maximizing mutual information (MI) between node and graph representations have achieved state-of-the-art results on both node classification (Velickovicet al., 2019) and graph classification (Sun et al, 2020) tasks.
- These methods require specialized encoders to learn graph or node level representations

Highlights

- Graph neural networks (GNN) (Li et al, 2015; Gilmer et al, 2017; Kipf & Welling, 2017; Velickovicet al., 2018; Xu et al, 2019b) reconcile the expressive power of graphs in modeling interactions with unparalleled capacity of deep models in learning representations
- To further improve contrastive representation learning on node and graph classification tasks, we systematically study the major components of our framework and surprisingly show that unlike visual contrastive learning: (1) increasing the number of views, i.e., augmentations, to more than two views does not improve the performance and the best performance is achieved by contrasting encodings from first-order neighbors and a general graph diffusion, (2) contrasting node and graph encodings across views achieves better results on both tasks compared to contrasting graphgraph or multi-scale encodings, (3) a simple graph readout layer achieves better performance on both tasks compared to hierarchical graph pooling methods such as differentiable pooling (DiffPool) (Ying et al, 2018), and (4) applying regularization or normalization layers has a negative effect on the performance
- We use three node classification and five graph classification benchmarks widely used in the literature (Kipf & Welling, 2017; Velickovicet al., 2018; 2019; Sun et al, 2020)
- We introduced a self-supervised approach for learning node and graph level representations by contrasting encodings from two structural views of graphs including first-order neighbors and a graph diffusion
- We showed that unlike visual representation learning, increasing the number of views or contrasting multi-scale encodings do not improve the performance
- On Cora, we achieve 86.8% accuracy, which is a 5.5% relative improvement over previous state-of-the-art
- We achieved new state-of-the-art in self-supervised learning on 8 out of 8 node and graph classification benchmarks under the linear evaluation protocol and outperformed strong supervised baselines in 4 out of 8 benchmarks

Methods

- Inspired by recent advances in multi-view contrastive learning for visual representation learning, the approach learns node and graph representations by maximizing MI between node representations of one view and graph representation of another view and vice versa which achieves better results compared to contrasting global or multi-scale encodings on both node and graph classification tasks.
- The authors can consider two types of augmentations on graphs: (1) feature-space augmentations operating on initial node features, e.g., masking or adding Gaussian noise, and (2) structure-space augmentations and corruptions operating on graph structure by adding or removing connectivities, sub-sampling, or generating global views using shortest distances or diffusion matrices.
- MLP (VELIC KOVIC ET AL., 2018) ICA (LU & GETOOR, 2003) LP (ZHU ET AL., 2003) MANIREG (BELKIN ET AL., 2006) SEMIEMB (WESTON ET AL., 2012) PLANETOID (YANG ET AL., 2016) CHEBYSHEV (DEFFERRARD ET AL., 2016) GCN (KIPF & WELLING, 2017) MONET (MONTI ET AL., 2017) JKNET (XU ET AL., 2018) GAT (VELIC KOVIC ET AL., 2018)

Results

- The authors use Citeseer, Cora, and Pubmed citation networks (Sen et al, 2008) where documents are connected through citations.
- The authors use the following: MUTAG (Kriege & Mutzel, 2012) containing mutagenic compounds, PTC (Kriege & Mutzel, 2012) containing compounds tested for carcinogenicity, Reddit-Binary (Yanardag & Vishwana, 2015) connecting users through responses in Reddit online discussions, and IMDB-Binary and IMDB-Multi (Yanardag & Vishwana, 2015) connecting actors/actresses based on movie appearances.

Conclusion

- The authors introduced a self-supervised approach for learning node and graph level representations by contrasting encodings from two structural views of graphs including first-order neighbors and a graph diffusion.
- The authors showed that unlike visual representation learning, increasing the number of views or contrasting multi-scale encodings do not improve the performance.
- Using these findings, the authors achieved new state-of-the-art in self-supervised learning on 8 out of 8 node and graph classification benchmarks under the linear evaluation protocol and outperformed strong supervised baselines in 4 out of 8 benchmarks.
- The authors are planning to investigate large pre-training and transfer learning capabilities of the proposed method

Summary

## Introduction:

Graph neural networks (GNN) (Li et al, 2015; Gilmer et al, 2017; Kipf & Welling, 2017; Velickovicet al., 2018; Xu et al, 2019b) reconcile the expressive power of graphs in modeling interactions with unparalleled capacity of deep models in learning representations.- Recent works on contrastive learning by maximizing mutual information (MI) between node and graph representations have achieved state-of-the-art results on both node classification (Velickovicet al., 2019) and graph classification (Sun et al, 2020) tasks.
- These methods require specialized encoders to learn graph or node level representations
## Methods:

Inspired by recent advances in multi-view contrastive learning for visual representation learning, the approach learns node and graph representations by maximizing MI between node representations of one view and graph representation of another view and vice versa which achieves better results compared to contrasting global or multi-scale encodings on both node and graph classification tasks.- The authors can consider two types of augmentations on graphs: (1) feature-space augmentations operating on initial node features, e.g., masking or adding Gaussian noise, and (2) structure-space augmentations and corruptions operating on graph structure by adding or removing connectivities, sub-sampling, or generating global views using shortest distances or diffusion matrices.
- MLP (VELIC KOVIC ET AL., 2018) ICA (LU & GETOOR, 2003) LP (ZHU ET AL., 2003) MANIREG (BELKIN ET AL., 2006) SEMIEMB (WESTON ET AL., 2012) PLANETOID (YANG ET AL., 2016) CHEBYSHEV (DEFFERRARD ET AL., 2016) GCN (KIPF & WELLING, 2017) MONET (MONTI ET AL., 2017) JKNET (XU ET AL., 2018) GAT (VELIC KOVIC ET AL., 2018)
## Results:

The authors use Citeseer, Cora, and Pubmed citation networks (Sen et al, 2008) where documents are connected through citations.- The authors use the following: MUTAG (Kriege & Mutzel, 2012) containing mutagenic compounds, PTC (Kriege & Mutzel, 2012) containing compounds tested for carcinogenicity, Reddit-Binary (Yanardag & Vishwana, 2015) connecting users through responses in Reddit online discussions, and IMDB-Binary and IMDB-Multi (Yanardag & Vishwana, 2015) connecting actors/actresses based on movie appearances.
## Conclusion:

The authors introduced a self-supervised approach for learning node and graph level representations by contrasting encodings from two structural views of graphs including first-order neighbors and a graph diffusion.- The authors showed that unlike visual representation learning, increasing the number of views or contrasting multi-scale encodings do not improve the performance.
- Using these findings, the authors achieved new state-of-the-art in self-supervised learning on 8 out of 8 node and graph classification benchmarks under the linear evaluation protocol and outperformed strong supervised baselines in 4 out of 8 benchmarks.
- The authors are planning to investigate large pre-training and transfer learning capabilities of the proposed method

- Table1: Statistics of classification benchmarks
- Table2: Mean node classification accuracy for supervised and unsupervised models. The input column highlights the data available to each model during training (X: features, A: adjacency matrix, S: diffusion matrix, Y: labels)
- Table3: Performance on node clustering task reported in normalized MI (NMI) and adjusted rand index (ARI) measures
- Table4: Mean 10-fold cross validation accuracy on graphs for kernel, supervised, and unsupervised methods
- Table5: Effect of MI estimator, contrastive mode, and views on the accuracy on both node and graph classification tasks

Related work

- 2.1. Unsupervised Representation Learning on Graphs

Random walks (Perozzi et al, 2014; Tang et al, 2015; Grover & Leskovec, 2016; Hamilton et al, 2017) flatten graphs into sequences by taking random walks across nodes and use language models to learn node representations. They are shown to over-emphasize proximity information at the expense of structural information (Velickovicet al., 2019; Ribeiro et al, 2017). Also, they are limited to transductive settings and cannot use node features (You et al, 2019). Graph kernels (Borgwardt & Kriegel, 2005; Shervashidze et al, 2009; 2011; Yanardag & Vishwana, 2015; Kondor & Pan, 2016; Kriege et al, 2016) decompose graphs into substructures and use kernel functions to measure graph similarity between them. Nevertheless, they require non-trivial task of devising similarity measures between substructures. Graph autoencoders (GAE) (Kipf & Welling, 2016; Garcia Duran & Niepert, 2017; Wang et al, 2017; Pan et al, 2018; Park et al, 2019) train encoders that impose the topological closeness of nodes in the graph structure on the latent space by predicting the first-order neighbors. GAEs overemphasize proximity information (Velickovicet al., 2019) and suffer from unstructured predictions (Tian et al, 2019).

Funding

- We achieve new state-ofthe-art results in self-supervised learning on 8 out of 8 node and graph classification benchmarks under the linear evaluation protocol
- On Cora (node) and Reddit-Binary (graph) classification benchmarks, we achieve 86.8% and 84.5% accuracy, which are 5.5% and 2.4% relative improvements over previous state-of-the-art
- To further improve contrastive representation learning on node and graph classification tasks, we systematically study the major components of our framework and surprisingly show that unlike visual contrastive learning: (1) increasing the number of views, i.e., augmentations, to more than two views does not improve the performance and the best performance is achieved by contrasting encodings from first-order neighbors and a general graph diffusion, (2) contrasting node and graph encodings across views achieves better results on both tasks compared to contrasting graphgraph or multi-scale encodings, (3) a simple graph readout layer achieves better performance on both tasks compared to hierarchical graph pooling methods such as differentiable pooling (DiffPool) (Ying et al, 2018), and (4) applying regularization (except early-stopping) or normalization layers has a negative effect on the performance. Using these findings, we achieve new state-of-the-art in self-supervised learning on 8 out of 8 node and graph classification benchmarks under the linear evaluation protocol
- On Cora node classification benchmark, our approach achieves 86.8% accuracy, which is a 5.5% relative improvement over previous state-of-the-art (Velickovicet al., 2019), and on Reddit-Binary graph classification benchmark, it achieves 84.5% accuracy, i.e., a 2.4% relative improvement over previous state-of-the-art (Sun et al, 2020)
- The results reported in Table 2 show that we achieve state-of-the-art results with respect to previous unsupervised models
- On Cora, we achieve 86.8% accuracy, which is a 5.5% relative improvement over previous state-of-the-art
- The results shown in Table 3 suggest that our model achieves state-of-the-art NMI and ARI scores across all benchmarks
- The results shown in Table 4 suggest that our approach achieves state-of-the-art results with respect to unsupervised models
- On Reddit-Binary (Yanardag & Vishwana, 2015), it achieves 84.5% accuracy, i.e., a 2.4% relative improvement over previous state-of-the-art
- It is noteworthy that we achieve the state-of-the-art results on both node and graph classification benchmarks using a unified approach and unlike previous unsupervised models (Velickovicet al., 2019; Sun et al, 2020), we do not devise a specialized encoder for each task
- Furthermore, we investigated whether increasing the number of views increases the performance on down-stream tasks, monotonically
- We observed that applying the former achieves significantly better results compared to the latter or a combination of both
- We showed that unlike visual representation learning, increasing the number of views or contrasting multi-scale encodings do not improve the performance. Using these findings, we achieved new state-of-the-art in self-supervised learning on 8 out of 8 node and graph classification benchmarks under the linear evaluation protocol and outperformed strong supervised baselines in 4 out of 8 benchmarks

Study subjects and analysis

datasets: 5

For example, on Reddit-Binary (Yanardag & Vishwana, 2015), it achieves 84.5% accuracy, i.e., a 2.4% relative improvement over previous state-of-the-art. Our model also outperforms kernel methods in 4 out of 5 datasets and also outperforms best supervised model in one of the datasets. When compared to supervised baselines individually, our model outperforms GCN and GAT models in 3 out of 5 datasets, e.g., a 5.3% relative improvement over GAT on IMDB-Binary dataset

datasets: 5

Our model also outperforms kernel methods in 4 out of 5 datasets and also outperforms best supervised model in one of the datasets. When compared to supervised baselines individually, our model outperforms GCN and GAT models in 3 out of 5 datasets, e.g., a 5.3% relative improvement over GAT on IMDB-Binary dataset. It is noteworthy that we achieve the state-of-the-art results on both node and graph classification benchmarks using a unified approach and unlike previous unsupervised models (Velickovicet al., 2019; Sun et al, 2020), we do not devise a specialized encoder for each task

datasets: 3

We investigated four MI estimators including: noisecontrastive estimation (NCE) (Gutmann & Hyvarinen, 2010; Oord et al, 2018), Jensen-Shannon (JSD) estimator following formulation in (Nowozin et al, 2016), normalized temperature-scaled cross-entropy (NT-Xent) (Chen et al, 2020), and Donsker-Varadhan (DV) representation of the KL-divergence (Donsker & Varadhan, 1975). The results shown in Table 5 suggests that Jensen-Shannon estimator achieves better results across all graph classification benchmarks, whereas in node classification benchmarks, NCE achieves better results in 2 out of 3 datasets. 4.4.2

Reference

- Adhikari, B., Zhang, Y., Ramakrishnan, N., and Prakash, B. A. Sub2vec: Feature learning for subgraphs. In PacificAsia Conference on Knowledge Discovery and Data Mining, pp. 170–182, 2018.
- Ba, J. L., Kiros, J. R., and Hinton, G. E. Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
- Bachman, P., Hjelm, R. D., and Buchwalter, W. Learning representations by maximizing mutual information across views. In Advances in Neural Information Processing Systems, pp. 15509–15519, 2019.
- Belkin, M., Niyogi, P., and Sindhwani, V. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research, 7:2399–2434, 2006.
- Borgwardt, K. M. and Kriegel, H.-P. Shortest-path kernels on graphs. In International Conference on Data Mining, 2005.
- Chen, T., Kornblith, S., Norouzi, M., and Hinton, G. A simple framework for contrastive learning of visual representations. arXiv preprint arXiv:2002.05709, 2020.
- Defferrard, M., Bresson, X., and Vandergheynst, P. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems, pp. 3844–3852. 2016.
- Donsker, M. D. and Varadhan, S. S. Asymptotic evaluation of certain markov process expectations for large time. Communications on Pure and Applied Mathematics, 28 (1):1–47, 1975.
- Duvenaud, D. K., Maclaurin, D., Iparraguirre, J., Bombarell, R., Hirzel, T., Aspuru-Guzik, A., and Adams, R. P. Convolutional networks on graphs for learning molecular fingerprints. In Advances in Neural Information Processing Systems, pp. 2224–2232, 2015.
- Garcia Duran, A. and Niepert, M. Learning graph representations with embedding propagation. In Advances in Neural Information Processing Systems, pp. 5119–5130. 2017.
- Gartner, T., Flach, P., and Wrobel, S. On graph kernels: Hardness results and efficient alternatives. In Learning theory and kernel machines, pp. 129–143. 2003.
- Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., and Dahl, G. E. Neural message passing for quantum chemistry. In International Conference on Machine Learning, pp. 1263–1272, 2017.
- Glorot, X. and Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In International Conference on Artificial Intelligence and Statistics, pp. 249–256, 2010.
- Grover, A. and Leskovec, J. node2vec: Scalable feature learning for networks. In International Conference on Knowledge Discovery and Data Mining, pp. 855–864, 2016.
- Gutmann, M. and Hyvarinen, A. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In International Conference on Artificial Intelligence and Statistics, pp. 297–304, 2010.
- Hamilton, W., Ying, Z., and Leskovec, J. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems, pp. 1024–1034, 2017.
- Hassani, K. and Haley, M. Unsupervised multi-task feature learning on point clouds. In International Conference on Computer Vision, pp. 8160–8171, 2019.
- He, K., Zhang, X., Ren, S., and Sun, J. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In International Conference on Computer Vision, pp. 1026–1034, 2015.
- Hjelm, R. D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., and Bengio, Y. Learning deep representations by mutual information estimation and maximization. In International Conference on Learning Representations, 2019.
- Ioffe, S. and Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning, pp. 448–456, 2015.
- Jiang, B., Lin, D., Tang, J., and Luo, B. Data representation and learning with graph diffusion-embedding networks. In Conference on Computer Vision and Pattern Recognition, pp. 10414–10423, 2019.
- Khasahmadi, A., Hassani, K., Moradi, P., Lee, L., and Morris, Q. Memory-based graph networks. In International Conference on Learning Representations, 2020.
- Kingma, D. P. and Ba, J. L. Adam: Amethod for stochastic optimization. In International Conference on Learning Representation, 2014.
- Kipf, T. N. and Welling, M. Variational graph auto-encoders. arXiv preprint arXiv:1611.07308, 2016.
- Kipf, T. N. and Welling, M. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations, 2017.
- Klicpera, J., Bojchevski, A., and Gunnemann, S. Combining neural networks with personalized pagerank for classification on graphs. In International Conference on Learning Representations, 2019a.
- Klicpera, J., Weiß enberger, S., and Gunnemann, S. Diffusion improves graph learning. In Advances in Neural Information Processing Systems, pp. 13333–13345. 2019b.
- Kondor, R. and Pan, H. The multiscale laplacian graph kernel. In Advances in Neural Information Processing Systems, pp. 2990–2998. 2016.
- Kondor, R. I. and Lafferty, J. Diffusion kernels on graphs and other discrete structures. In International Conference on Machine Learning, pp. 315–22, 2002.
- Kriege, N. and Mutzel, P. Subgraph matching kernels for attributed graphs. In International Conference on Machine Learning, pp. 291–298, 2012.
- Kriege, N. M., Giscard, P.-L., and Wilson, R. On valid optimal assignment kernels and applications to graph classification. In Advances in Neural Information Processing Systems, pp. 1623–1631, 2016.
- Li, Y., Tarlow, D., Brockschmidt, M., and Zemel, R. Gated graph sequence neural networks. In International Conference on Learning Representations, 2015.
- Li, Y., Gu, C., Dullien, T., Vinyals, O., and Kohli, P. Graph matching networks for learning the similarity of graph structured objects. In International Conference on Machine Learning, pp. 3835–3845, 2019.
- Linsker, R. Self-organization in a perceptual network. Computer, 21(3):105–117, 1988.
- Lu, Q. and Getoor, L. Link-based classification. In International Conference on Machine Learning, pp. 496–503, 2003.
- Monti, F., Boscaini, D., Masci, J., Rodola, E., Svoboda, J., and Bronstein, M. M. Geometric deep learning on graphs and manifolds using mixture model cnns. In Conference on Computer Vision and Pattern Recognition, 2017.
- Narayanan, A., Chandramohan, M., Venkatesan, R., Chen, L., Liu, Y., and Jaiswal, S. graph2vec: Learning distributed representations of graphs. arXiv preprint arXiv:1707.05005, 2017.
- Nowozin, S., Cseke, B., and Tomioka, R. f-gan: Training generative neural samplers using variational divergence minimization. In Advances in Neural Information Processing Systems, pp. 271–279. 2016.
- Oord, A. v. d., Li, Y., and Vinyals, O. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
- Page, L., Brin, S., Motwani, R., and Winograd, T. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab, 1999.
- Pan, S., Hu, R., Long, G., Jiang, J., Yao, L., and Zhang, C. Adversarially regularized graph autoencoder for graph embedding. In International Joint Conference on Artificial Intelligence, pp. 2609–2615, 2018.
- Park, J., Lee, M., Chang, H. J., Lee, K., and Choi, J. Y. Symmetric graph convolutional autoencoder for unsupervised graph representation learning. In International Conference on Computer Vision, pp. 6519–6528, 2019.
- Perozzi, B., Al-Rfou, R., and Skiena, S. Deepwalk: Online learning of social representations. In International Conference on Knowledge Discovery and Data Mining, pp. 701–710, 2014.
- Ribeiro, L., Saverese, P., and Figueiredo, D. Struc2vec: Learning node representations from structural identity. In International Conference on Knowledge Discovery and Data Mining, pp. 385–394, 2017.
- Sanchez-Gonzalez, A., Heess, N., Springenberg, J. T., Merel, J., Riedmiller, M., Hadsell, R., and Battaglia, P. Graph networks as learnable physics engines for inference and control. In International Conference on Machine Learning, pp. 4470–4479, 2018.
- Sen, P., Namata, G., Bilgic, M., Getoor, L., Galligher, B., and Eliassi-Rad, T. Collective classification in network data. AI Magazine, 29(3):93–93, 2008.
- Shervashidze, N., Vishwanathan, S., Petri, T., Mehlhorn, K., and Borgwardt, K. Efficient graphlet kernels for large graph comparison. In Artificial Intelligence and Statistics, pp. 488–495, 2009.
- Shervashidze, N., Schweitzer, P., Leeuwen, E. J. v., Mehlhorn, K., and Borgwardt, K. M. Weisfeiler-lehman graph kernels. Journal of Machine Learning Research, 12:2539–2561, 2011.
- Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(56):1929–1958, 2014.
- Sun, F.-Y., Hoffman, J., Verma, V., and Tang, J. Infograph: Unsupervised and semi-supervised graph-level representation learning via mutual information maximization. In International Conference on Learning Representations, 2020.
- Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., and Mei, Q. Line: Large-scale information network embedding. In International Conference on World Wide Web, pp. 1067– 1077, 2015.
- Tian, Y., Krishnan, D., and Isola, P. Contrastive multiview coding. arXiv preprint arXiv:1906.05849, 2019.
- Tschannen, M., Djolonga, J., Rubenstein, P. K., Gelly, S., and Lucic, M. On mutual information maximization for representation learning. In International Conference on Learning Representations, 2020.
- Tsitsulin, A., Mottin, D., Karras, P., and Muller, E. Verse: Versatile graph embeddings from similarity measures. In International Conference on World Wide Web, pp. 539–548, 2018.
- Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., and Bengio, Y. Graph attention networks. In International Conference on Learning Representations, 2018.
- Velickovic, P., Fedus, W., Hamilton, W. L., Lio, P., Bengio, Y., and Hjelm, R. D. Deep graph infomax. In International Conference on Learning Representations, 2019.
- Vivona, S. and Hassani, K. Relational graph representation learning for open-domain question answering. Advances in Neural Information Processing Systems, Graph Representation Learning Workshop, 2019.
- Wang, C., Pan, S., Long, G., Zhu, X., and Jiang, J. Mgae: Marginalized graph autoencoder for graph clustering. In Conference on Information and Knowledge Management, pp. 889–898, 2017.
- Wang, N., Zhang, Y., Li, Z., Fu, Y., Liu, W., and Jiang, Y.-G. Pixel2mesh: Generating 3d mesh models from single rgb images. In European Conference on Computer Vision, pp. 52–67, 2018.
- Wang, T., Zhou, Y., Fidler, S., and Ba, J. Neural graph evolution: Automatic robot design. In International Conference on Learning Representations, 2019.
- Weston, J., Ratle, F., Mobahi, H., and Collobert, R. Deep learning via semi-supervised embedding. In Neural Networks: Tricks of the Trade, pp. 639–655. 2012.
- IEEE Transactions on Neural Networks and Learning Systems, 2020.
- Xu, B., Shen, H., Cao, Q., Cen, K., and Cheng, X. Graph convolutional networks using heat kernel for semisupervised learning. In International Joint Conference on Artificial Intelligence, pp. 1928–1934, 2019a.
- Xu, K., Li, C., Tian, Y., Sonobe, T., Kawarabayashi, K.-i., and Jegelka, S. Representation learning on graphs with jumping knowledge networks. In International Conference on Machine Learning, pp. 5453–5462, 2018.
- Xu, K., Hu, W., Leskovec, J., and Jegelka, S. How powerful are graph neural networks? In International Conference on Learning Representations, 2019b.
- Yanardag, P. and Vishwana, S. Deep graph kernels. In International Conference on Knowledge Discovery and Data Mining, pp. 1365–1374, 2015.
- Yang, Z., Cohen, W., and Salakhudinov, R. Revisiting semi-supervised learning with graph embeddings. In International Conference on Machine Learning, pp. 40– 48, 2016.
- Ying, Z., You, J., Morris, C., Ren, X., Hamilton, W., and Leskovec, J. Hierarchical graph representation learning with differentiable pooling. In Advances in Neural Information Processing Systems, pp. 4800–4810, 2018.
- You, J., Ying, R., and Leskovec, J. Position-aware graph neural networks. In International Conference on Machine Learning, pp. 7134–7143, 2019.
- Zhang, Z., Cui, P., and Zhu, W. Deep learning on graphs: A survey. IEEE Transactions on Knowledge and Data Engineering, 2020.
- Zhu, X., Ghahramani, Z., and Lafferty, J. D. Semisupervised learning using gaussian fields and harmonic functions. In International Conference on Machine Learning, pp. 912–919, 2003.

Tags

Comments