Bayesian Graph Neural Networks with Adaptive Connection Sampling

Arman Hasanzadeh
Arman Hasanzadeh
Ehsan Hajiramezanali
Ehsan Hajiramezanali
Shahin Boluki
Shahin Boluki

ICML, pp. 4094-4104, 2020.

被引用0|浏览76
EI
微博一下
We proposed a unified framework for adaptive connection sampling in graph neural networks that generalizes existing stochastic regularization techniques for training graph neural networks

摘要

We propose a unified framework for adaptive connection sampling in graph neural networks (GNNs) that generalizes existing stochastic regularization methods for training GNNs. The proposed framework not only alleviates over-smoothing and over-fitting tendencies of deep GNNs, but also enables learning with uncertainty in graph analytic ta...更多

代码

数据

0
ZH
下载 PDF 全文
引用
微博一下
简介
  • Graph neural networks (GNNs), and its numerous variants, have shown to be successful in graph representation learning by extracting high-level features for nodes from their topological neighborhoods.
  • Empirical results have shown that, due to the nature of Laplacian smoothing in GNNs, graph convolutions have the over-smoothing tendency of mixing representations of adjacent nodes so that, when increasing the number of GNN layers, all nodes representations will converge to a stationary point, making them unrelated to node features (Li et al, 2018)
  • While it has been shown in Kipf & Welling (2017) that DropOut alone is ineffectual in preventing over-fitting, partially due to over-smoothing, the combination of DropEdge, in which a set of edges are randomly removed from the graph, with DropOut has recently shown potential to alleviate these problems (Rong et al, 2019)
重点内容
  • Graph neural networks (GNNs), and its numerous variants, have shown to be successful in graph representation learning by extracting high-level features for nodes from their topological neighborhoods
  • We propose a general stochastic regularization technique for graph neural networks—Graph DropConnect (GDC)—by adaptive connection sampling, which can be interpreted as an approximation of Bayesian graph neural networks
  • We show the performance of Graph DropConnect compared to existing methods in alleviating the issue of over-smoothing in graph neural networks
  • We proposed a unified framework for adaptive connection sampling in graph neural networks that generalizes existing stochastic regularization techniques for training graph neural networks
  • We further show that training a graph neural networks with Graph DropConnect is equivalent to an approximation of training Bayesian graph neural networks
  • We further show that the quality of uncertainty derived by Graph DropConnect is better than DropOut in graph neural networks
方法
  • While in the GDC formulation, as shown in (4) and (5), the normalization N(·) is applied after masking, one can multiply the randomly drawn mask with the pre-computed normalized adjacency matrix.
  • This relaxation reduces the computation time and has negligible effect on the performance based on the experiments.
  • The authors have used asymmetric masks and multiplied the mask with the normalized adjacency matrix
结果
  • The authors test the performance of the adaptive connection sampling framework, learnable GDC, on semi-supervised node classification using real-world citation graphs.
  • The authors compare the uncertainty estimates of predictions by Monte Carlo beta-Bernoulli GDC and Monte Carlo Dropout.
  • The authors show the performance of GDC compared to existing methods in alleviating the issue of over-smoothing in GNNs. the authors investigate the effect of the number of blocks on the performance of GDC.
  • The authors have investigated learning separate drop rates for every edge in the network, i.e. local GDC, which is included in the supplementary materials
结论
  • For example in Citeseer, 4-layer GCN shows significant decrease in performance compared to 2-layer GCN.In this paper, the authors proposed a unified framework for adaptive connection sampling in GNNs that generalizes existing stochastic regularization techniques for training GNNs. The authors' proposed method, Graph DropConnect (GDC), alleviates over-smoothing and over-fitting tendencies of deep GNNs, and enables learning with uncertainty in graph analytic tasks with GNNs. Instead of using fixed sampling rates, the GDC technique parameters can be trained jointly with GNN model parameters.
总结
  • Introduction:

    Graph neural networks (GNNs), and its numerous variants, have shown to be successful in graph representation learning by extracting high-level features for nodes from their topological neighborhoods.
  • Empirical results have shown that, due to the nature of Laplacian smoothing in GNNs, graph convolutions have the over-smoothing tendency of mixing representations of adjacent nodes so that, when increasing the number of GNN layers, all nodes representations will converge to a stationary point, making them unrelated to node features (Li et al, 2018)
  • While it has been shown in Kipf & Welling (2017) that DropOut alone is ineffectual in preventing over-fitting, partially due to over-smoothing, the combination of DropEdge, in which a set of edges are randomly removed from the graph, with DropOut has recently shown potential to alleviate these problems (Rong et al, 2019)
  • Methods:

    While in the GDC formulation, as shown in (4) and (5), the normalization N(·) is applied after masking, one can multiply the randomly drawn mask with the pre-computed normalized adjacency matrix.
  • This relaxation reduces the computation time and has negligible effect on the performance based on the experiments.
  • The authors have used asymmetric masks and multiplied the mask with the normalized adjacency matrix
  • Results:

    The authors test the performance of the adaptive connection sampling framework, learnable GDC, on semi-supervised node classification using real-world citation graphs.
  • The authors compare the uncertainty estimates of predictions by Monte Carlo beta-Bernoulli GDC and Monte Carlo Dropout.
  • The authors show the performance of GDC compared to existing methods in alleviating the issue of over-smoothing in GNNs. the authors investigate the effect of the number of blocks on the performance of GDC.
  • The authors have investigated learning separate drop rates for every edge in the network, i.e. local GDC, which is included in the supplementary materials
  • Conclusion:

    For example in Citeseer, 4-layer GCN shows significant decrease in performance compared to 2-layer GCN.In this paper, the authors proposed a unified framework for adaptive connection sampling in GNNs that generalizes existing stochastic regularization techniques for training GNNs. The authors' proposed method, Graph DropConnect (GDC), alleviates over-smoothing and over-fitting tendencies of deep GNNs, and enables learning with uncertainty in graph analytic tasks with GNNs. Instead of using fixed sampling rates, the GDC technique parameters can be trained jointly with GNN model parameters.
表格
  • Table1: Semi-supervised node classification accuracy of GCNs with our adaptive connection sampling and baseline methods
  • Table2: Accuracy of ARM optimization-based variants of our proposed method in semi-supervised node classification
  • Table3: Accuracy of 128-dimensional 4-layer GCN-BBGDC with different number of blocks on Cora in semi-supervised node classification
  • Table4: Graph dataset statistics
Download tables as Excel
基金
  • The presented materials are based upon the work supported by the National Science Foundation under Grants ENG1839816, IIS-1848596, CCF-1553281, IIS-1812641, IIS1812699, and CCF-1934904. 2 blocks 16 blocks 32 blocks GCN-BBGDC 82.2 Bayesian Graph Neural Networks with Adaptive Connection Sampling: Supplementary Materials In this supplement, we first provide an ablation study on local GDC
引用论文
  • Bojchevski, A. and Gunnemann, S. Deep gaussian embedding of graphs: Unsupervised inductive learning via ranking. In International Conference on Learning Representations, 2018.
    Google ScholarLocate open access versionFindings
  • Boluki, S., Ardywibowo, R., Dadaneh, S. Z., Zhou, M., and Qian, X. Learnable Bernoulli dropout for Bayesian deep learning. arXiv preprint arXiv:2002.05155, 2020.
    Findings
  • Bowman, S. R., Vilnis, L., Vinyals, O., Dai, A. M., Jozefowicz, R., and Bengio, S. Generating sentences from a continuous space. arXiv preprint arXiv:1511.06349, 2015.
    Findings
  • Chen, J., Ma, T., and Xiao, C. FastGCN: Fast learning with graph convolutional networks via importance sampling. arXiv preprint arXiv:1801.10247, 2018.
    Findings
  • Chen, S., Sandryhaila, A., Moura, J. M., and Kovacevic, J. Signal recovery on graphs: Variation minimization. IEEE Transactions on Signal Processing, 63(17):4609–4624, 2015.
    Google ScholarLocate open access versionFindings
  • Dadaneh, S. Z., Boluki, S., Yin, M., Zhou, M., and Qian, X. Pairwise supervised hashing with Bernoulli variational auto-encoder and self-control gradient estimator. arXiv preprint arXiv:2005.10477, 2020a.
    Findings
  • Dadaneh, S. Z., Boluki, S., Zhou, M., and Qian, X. Arsm gradient estimator for supervised learning to rank. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3157–3161, 2020b.
    Google ScholarLocate open access versionFindings
  • Fu, M. C. Gradient estimation. Handbooks in operations research and management science, 13:575–616, 2006.
    Google ScholarLocate open access versionFindings
  • Gal, Y. and Ghahramani, Z. Bayesian convolutional neural networks with bernoulli approximate variational inference. arXiv preprint arXiv:1506.02158, 2015.
    Findings
  • Gal, Y. and Ghahramani, Z. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning, pp. 1050–1059, 2016.
    Google ScholarLocate open access versionFindings
  • Gal, Y., Hron, J., and Kendall, A. Concrete dropout. In Advances in neural information processing systems, pp. 3581–3590, 2017.
    Google ScholarLocate open access versionFindings
  • Ghahramani, Z. and Griffiths, T. L. Infinite latent feature models and the Indian buffet process. In Advances in neural information processing systems, pp. 475–482, 2006.
    Google ScholarLocate open access versionFindings
  • Goyal, A. G. A. P., Sordoni, A., Cote, M.-A., Ke, N. R., and Bengio, Y. Z-forcing: Training stochastic recurrent networks. In Advances in neural information processing systems, pp. 6713–6723, 2017.
    Google ScholarLocate open access versionFindings
  • Hajiramezanali, E., Dadaneh, S. Z., Karbalayghareh, A., Zhou, M., and Qian, X. Bayesian multi-domain learning for cancer subtype discovery from next-generation sequencing count data. In Advances in Neural Information Processing Systems, pp. 9115–9124, 2018.
    Google ScholarLocate open access versionFindings
  • Hajiramezanali, E., Hasanzadeh, A., Duffield, N., Narayanan, K. R., Zhou, M., and Qian, X. Variational graph recurrent neural networks. In Advances in Neural Information Processing Systems, 2019.
    Google ScholarLocate open access versionFindings
  • Hajiramezanali, E., Hasanzadeh, A., Duffield, N., Narayanan, K., Zhou, M., and Qian, X. Semiimplicit stochastic recurrent neural networks. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3342–3346. IEEE, 2020.
    Google ScholarLocate open access versionFindings
  • Hamilton, W., Ying, Z., and Leskovec, J. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems, pp. 1024–1034, 2017.
    Google ScholarLocate open access versionFindings
  • Hasanzadeh, A., Hajiramezanali, E., Duffield, N., Narayanan, K. R., Zhou, M., and Qian, X. Semi-implicit graph variational auto-encoders. In Advances in Neural Information Processing Systems, 2019.
    Google ScholarLocate open access versionFindings
  • Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. R. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580, 2012.
    Findings
  • Jang, E., Gu, S., and Poole, B. Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144, 2016.
    Findings
  • Kingma, D. P. and Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
    Findings
  • Kingma, D. P. and Welling, M. Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114, 2013.
    Findings
  • Kingma, D. P., Salimans, T., and Welling, M. Variational dropout and the local reparameterization trick. In Advances in neural information processing systems, pp. 2575–2583, 2015.
    Google ScholarLocate open access versionFindings
  • Kipf, T. N. and Welling, M. Variational graph auto-encoders. arXiv preprint arXiv:1611.07308, 2016.
    Findings
  • Kipf, T. N. and Welling, M. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations, 2017.
    Google ScholarLocate open access versionFindings
  • Kumaraswamy, P. A generalized probability density function for double-bounded random processes. Journal of Hydrology, 46(1-2):79–88, 1980.
    Google ScholarLocate open access versionFindings
  • Li, Q., Han, Z., and Wu, X.-M. Deeper insights into graph convolutional networks for semi-supervised learning. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
    Google ScholarLocate open access versionFindings
  • Liu, X., Gao, J., Celikyilmaz, A., Carin, L., et al. Cyclical annealing schedule: A simple approach to mitigating kl vanishing. arXiv preprint arXiv:1903.10145, 2019.
    Findings
  • Ma, Y.-A., Chen, T., and Fox, E. A complete recipe for stochastic gradient MCMC. In Advances in Neural Information Processing Systems, pp. 2917–2925, 2015.
    Google ScholarLocate open access versionFindings
  • Zhang, Y., Pal, S., Coates, M., and Ustebay, D. Bayesian graph convolutional neural networks for semi-supervised classification. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pp. 5829–5836, 2019.
    Google ScholarLocate open access versionFindings
  • Zhou, M., Chen, H., Ren, L., Sapiro, G., Carin, L., and Paisley, J. W. Non-parametric Bayesian dictionary learning for sparse image representations. In Advances in neural information processing systems, pp. 2295–2303, 2009.
    Google ScholarLocate open access versionFindings
  • MacKay, D. J. Bayesian methods for adaptive models. PhD thesis, California Institute of Technology, 1992.
    Google ScholarFindings
  • Mukhoti, J. and Gal, Y. Evaluating bayesian deep learning methods for semantic segmentation. arXiv preprint arXiv:1811.12709, 2018.
    Findings
  • Neal, R. M. Bayesian learning for neural networks, volume 118. Springer Science & Business Media, 2012.
    Google ScholarFindings
  • Paisley, J., Blei, D., and Jordan, M. Variational Bayesian inference with stochastic search. arXiv preprint arXiv:1206.6430, 2012.
    Findings
  • Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., and Lerer, A. Automatic differentiation in pytorch. In NIPS-W, 2017.
    Google ScholarLocate open access versionFindings
  • Rezende, D. J., Mohamed, S., and Wierstra, D. Stochastic backpropagation and approximate inference in deep generative models. arXiv preprint arXiv:1401.4082, 2014.
    Findings
  • Rong, Y., Huang, W., Xu, T., and Huang, J. DropEdge: Towards the very deep graph convolutional networks for node classification, 2019.
    Google ScholarFindings
  • Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1):1929–1958, 2014.
    Google ScholarLocate open access versionFindings
  • Thibaux, R. and Jordan, M. I. Hierarchical beta processes and the Indian buffet process. In Artificial Intelligence and Statistics, pp. 564–571, 2007.
    Google ScholarLocate open access versionFindings
  • Williams, R. J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229–256, 1992.
    Google ScholarLocate open access versionFindings
  • Yin, M. and Zhou, M. ARM: Augment-REINFORCEmerge gradient for stochastic binary networks. In International Conference on Learning Representations, 2019.
    Google ScholarLocate open access versionFindings
您的评分 :
0

 

标签
评论