## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# Stochastic Deep Gaussian Processes over Graphs

NIPS 2020, (2020)

EI

Keywords

Abstract

In this paper we propose Stochastic Deep Gaussian Processes over Graphs (DGPG), which are deep Gaussian models that learn the mappings between input and output signals in graph domains. The approximate posterior distributions of the latent variables are derived with variational inference, and the evidence lower bound is evaluated and opti...More

Introduction

- Gaussian processes (GPs) [1] are a favourable choice in the machine learning arsenal, due to their distinguishing advantages in modeling uncertainties, the ability of introducing expert knowledge through the flexible kernel design, and the data efficient property which accounts for their success in small and medium datasets.
- A series of works about deep GPs were presented [7,8,9], which overcome the aforementioned disadvantages
- All these methods can find their roots in the seminal paper [10], which proposes to use a set of inducing points whose size is manageable to summarize all the information in the original large dataset - the inducing points and their latent function values can be inferred by variational inference.
- Lots of work emerged to assuage the complexity burden [22,23,24,25,26], among which the sparse Gaussian processes stand out as some of the most successful and popular methods [27,28,29,30,31,32]

Highlights

- Gaussian processes (GPs) [1] are a favourable choice in the machine learning arsenal, due to their distinguishing advantages in modeling uncertainties, the ability of introducing expert knowledge through the flexible kernel design, and the data efficient property which accounts for their success in small and medium datasets
- In this paper we propose Stochastic Deep Gaussian Processes over Graphs (DGPG), which is a method for modeling the relations between input and output signals over graph-structured domains
- We summarize the main contributions of this paper as follows: 1) We propose a novel Bayesian non-parametric method called Stochastic Deep Gaussian Processes over Graphs (DGPG), to model the relations of input/output signals over graphs; 2) It is rigorously proved that under some technical assumptions, the sampling variances of DGPG are strictly less that of [11], implying that DGPG achieves faster convergence by considering graph information; 3) We performed experiments on a synthetic dataset
- We show that our method outperforms a recent GP-based graph learning algorithm, and is competitive to a state-of-the-art DNN method on the challenging task of traffic flow prediction; 5) We show that DGPG possesses several other desirable characteristics: it can model uncertainties with high accuracy, and the automatic relevance determination (ARD) kernel allows it to learn which neighbors and features are of greater importance for the prediction
- In this paper we propose a method to learn the mappings between input and output signals over graphs, which we call Stochastic Deep Gaussian Processes over Graphs (DGPG)
- DGPG exhibits several appealing characteristics, such as the ability to accurately model uncertainties and to automatically discover which vertices and features are relevant to the prediction

Results

**Results in Table**

5 show that: 1) DGPG is competitive w.r.t. the state-of-the-art method DCRNN, and outperforms it in most cases in the more challenging dataset LA; 2) DGPG produces accurate predictions for different datasets and forecasting horizons, showing its stability and consistency; 3) DGPG achieves its best performance with the appropriate layers, demonstrating that it benefits from deep structures; 4) the performance can be improved by utilizing the validation data during training.- Variance Analysis A distinguishing advantage of GPs is that they are capable of modeling the predictive variances.
- Inspired by [51], the authors exam how many test instances fall in the predictive confidence interval.
- The authors can expect that the potion of the test instances falling in the predictive intervals should have the same ratio.
- Table 4 shows the experimental results.
- The authors can see that the numerical results comply with the analysis reasonably well, within the ±2σ intervals

Conclusion

- In this paper the authors propose a method to learn the mappings between input and output signals over graphs, which the authors call Stochastic Deep Gaussian Processes over Graphs (DGPG).
- The authors conducted thorough experiments in both synthetic and realistic datasets.
- Numerical results on a synthetic dataset validate the theoretical assumptions and analysis, the claim that the method has smaller sampling variances and converges faster.
- The authors show that the method generally outperforms other baselines in small datasets, and is competitive to the state-of-the-art method in the challenging task of traffic flow prediction.
- DGPG exhibits several appealing characteristics, such as the ability to accurately model uncertainties and to automatically discover which vertices and features are relevant to the prediction

Summary

## Introduction:

Gaussian processes (GPs) [1] are a favourable choice in the machine learning arsenal, due to their distinguishing advantages in modeling uncertainties, the ability of introducing expert knowledge through the flexible kernel design, and the data efficient property which accounts for their success in small and medium datasets.- A series of works about deep GPs were presented [7,8,9], which overcome the aforementioned disadvantages
- All these methods can find their roots in the seminal paper [10], which proposes to use a set of inducing points whose size is manageable to summarize all the information in the original large dataset - the inducing points and their latent function values can be inferred by variational inference.
- Lots of work emerged to assuage the complexity burden [22,23,24,25,26], among which the sparse Gaussian processes stand out as some of the most successful and popular methods [27,28,29,30,31,32]
## Objectives:

The authors' goals are different and the approaches share little similarity besides the minor points mentioned above.## Results:

**Results in Table**

5 show that: 1) DGPG is competitive w.r.t. the state-of-the-art method DCRNN, and outperforms it in most cases in the more challenging dataset LA; 2) DGPG produces accurate predictions for different datasets and forecasting horizons, showing its stability and consistency; 3) DGPG achieves its best performance with the appropriate layers, demonstrating that it benefits from deep structures; 4) the performance can be improved by utilizing the validation data during training.- Variance Analysis A distinguishing advantage of GPs is that they are capable of modeling the predictive variances.
- Inspired by [51], the authors exam how many test instances fall in the predictive confidence interval.
- The authors can expect that the potion of the test instances falling in the predictive intervals should have the same ratio.
- Table 4 shows the experimental results.
- The authors can see that the numerical results comply with the analysis reasonably well, within the ±2σ intervals
## Conclusion:

In this paper the authors propose a method to learn the mappings between input and output signals over graphs, which the authors call Stochastic Deep Gaussian Processes over Graphs (DGPG).- The authors conducted thorough experiments in both synthetic and realistic datasets.
- Numerical results on a synthetic dataset validate the theoretical assumptions and analysis, the claim that the method has smaller sampling variances and converges faster.
- The authors show that the method generally outperforms other baselines in small datasets, and is competitive to the state-of-the-art method in the challenging task of traffic flow prediction.
- DGPG exhibits several appealing characteristics, such as the ability to accurately model uncertainties and to automatically discover which vertices and features are relevant to the prediction

- Table1: Several statistics that are of interest for the theoretical analysis
- Table2: Dataset description of training/test instances, number of nodes, and average decal datasets and training/test splitting. Description of the gree. #Iter. is the steps of iteration
- Table3: GP-L and GPG-L are baselines in [<a class="ref-link" id="c20" href="#r20">20</a>]. SVR denotes support vector regression. For SVR the output is a function of its neighbors’ input and high-order graph information is lost. Metrics of GCGP [<a class="ref-link" id="c19" href="#r19">19</a>] are calculated using the recomposed prediction result. We report the results of DGPG using linear kernel (L), RBF kernel (RBF), Matérn32 kernel (M32), and the optimal layer. Terms with underline denote best results. Results of several other baselines are in the supplementary materials
- Table4: Variance Analysis of DGPG
- Table5: Comparison on the task of traffic flow prediction. Results of other baselines are obtained from [<a class="ref-link" id="c48" href="#r48">48</a>]

Related work

- To assuage the O(N 3) training complexity of standard GPs, several sparse approximation methods have been proposed [35, 36, 27,28,29, 31, 32], the ideas of which are to construct a manageable set of inducing points to summarize the information of the original dataset. It is rigorously established in [37] that under some technical assumptions sparse approximation methods can produce reliable results with M = O(logD N ) inducing points, where D is the dimension of the data. Inspired by some of them, [10] introduces a method to select the inducing points by using variational inference. A few years later this work has become the foundation of a series of deep Gaussian process models: inspired by [10] and the GP latent variable model (GP-LVM) [38], Damianou and Lawrence proposed deep GPs [7], which are composed of hierarchical Gaussian process mappings; several other deep GP models were lately proposed [8, 9]. All these models demonstrate their superior performances over a variety of challenging tasks. The work above derives analytically tractable evidence lower bounds, while in contrast with them in [11] - which is the most related paper of our work - the authors present a model with milder independent assumption and utilize the sampling technique. Applying the sampling technique not only allows the model to use both Gaussian (e.g. in regression) and non-Gaussian (e.g. in classification) likelihood in an uniform manner, but also enables it to take advantage of GPU acceleration. These strengths are also inherited by our proposal.

Funding

- Acknowledgments and Disclosure of Funding This work is supported in part by the National Key Research and Development Program of China under Grant 2018YFB1800204, the National Natural Science Foundation of China under Grant 61771273, the R&D Program of Shenzhen under Grant JCYJ20180508152204044, and the project “PCL Future Greater-Bay Area Network Facilities for Large-scale Experiments and Applications (LZC0019)”

Study subjects and analysis

randomly selected parents: 7

The goal of this example is to support our theoretical analysis with numerical evidences, e.g. verifying the positive definite assumption and the claim that graph can reduce sampling variances and lead to faster convergence. Consider a symmetric graph with 500 nodes, and each node is connected to approximately 7 randomly selected parents (including a self-connecting edge). We sample each input signal xi ∈ R500 from a standard multivariate normal distribution with an isometric covariance matrix x ∼ N (0, I)

Reference

- C. Rasmussen and C. Williams, Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning, MIT Press, Jan. 2006.
- A. B. Chan, Z.-S. J. Liang, and N. Vasconcelos, “Privacy preserving crowd monitoring: Counting people without people models or tracking,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR-2008), 2008.
- J. Snoek, H. Larochelle, and R. P. Adams, “Practical bayesian optimization of machine learning algorithms,” in Advances in Neural Information Processing Systems (NIPS-2012), pp. 2951– 2959, 2012.
- H. Liu, Y.-S. Ong, and J. Cai, “A survey of adaptive sampling for global metamodeling in support of simulation-based complex engineering design,” Structural and Multidisciplinary Optimization, vol. 57, no. 1, pp. 393–416, 2018.
- E. V. Bonilla, K. M. Chai, and C. Williams, “Multi-task gaussian process prediction,” in Advances in Neural Information Processing Systems (NIPS-2008), pp. 153–160, 2008.
- M. Kuss and C. E. Rasmussen, “Gaussian processes in reinforcement learning,” in Advances in Neural Information Processing Systems (NIPS-2004), pp. 751–758, 2004.
- A. C. Damianou and N. D. Lawrence, “Deep gaussian processes,” in International Conference on Artificial Intelligence and Statistics (AISTATS-2013), pp. 207–215, 2013.
- Z. Dai, A. C. Damianou, J. González, and N. D. Lawrence, “Variational auto-encoded deep gaussian processes,” in International Conference on Learning Representations (ICLR-2016), 2016.
- C. L. C. Mattos, Z. Dai, A. C. Damianou, J. Forth, G. A. Barreto, and N. D. Lawrence, “Recurrent gaussian processes,” in International Conference on Learning Representations (ICLR-2016), 2016.
- M. Titsias, “Variational learning of inducing variables in sparse gaussian processes,” in Artificial Intelligence and Statistics (AISTATS-2009), pp. 567–574, 2009.
- H. Salimbeni and M. Deisenroth, “Doubly stochastic variational inference for deep gaussian processes,” in Advances in Neural Information Processing Systems (NIPS-2017), pp. 4588–4599, 2017.
- J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun, “Spectral networks and locally connected networks on graphs,” in International Conference on Learning Representations (ICLR-2014), 2014.
- D. K. Duvenaud, D. Maclaurin, J. Iparraguirre, R. Bombarell, T. Hirzel, A. Aspuru-Guzik, and R. P. Adams, “Convolutional networks on graphs for learning molecular fingerprints,” in Advances in Neural Information Processing Systems (NIPS-2015), pp. 2224–2232, 2015.
- J. Atwood and D. Towsley, “Diffusion-convolutional neural networks,” in Advances in Neural Information Processing Systems (NIPS-2016), pp. 1993–2001, 2016.
- M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neural networks on graphs with fast localized spectral filtering,” in Advances in Neural Information Processing Systems (NIPS-2016), pp. 3844–3852, 2016.
- T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” in International Conference on Learning Representations (ICLR-2017), 2017.
- W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” in Advances in Neural Information Processing Systems (NIPS-2017), pp. 1024–1034, 2017.
- Y. C. Ng, N. Colombo, and R. Silva, “Bayesian semi-supervised learning with graph gaussian processes,” in Advances in Neural Information Processing Systems (NIPS-2018), pp. 1690–1701, 2018.
- I. Walker and B. Glocker, “Graph convolutional gaussian processes,” in International Conference on Machine Learning (ICML-2019), pp. 6495–6504, 2019.
- A. Venkitaraman, S. Chatterjee, and P. Handel, “Gaussian processes over graphs,” in International Conference on Acoustics, Speech and Signal Processing (ICASSP-2020), pp. 5640–5644, IEEE, 2020.
- C. M. Bishop, Pattern Recognition and Machine Learning. Springer, 2006.
- V. Tresp, “A bayesian committee machine,” Neural Computation, vol. 12, no. 11, pp. 2719–2741, 2000.
- Y. Cao and D. J. Fleet, “Generalized product of experts for automatic and principled fusion of gaussian process predictions,” arXiv preprint arXiv:1410.7827, 2014.
- M. Deisenroth and J. W. Ng, “Distributed gaussian processes,” in International Conference on Machine Learning (ICML-2015), pp. 1481–1490, 2015.
- H. Liu, J. Cai, Y. Wang, and Y. S. Ong, “Generalized robust bayesian committee machine for large-scale gaussian process regression,” in International Conference on Machine Learning (ICML-2018), 2018.
- H. Liu, Y.-S. Ong, X. Shen, and J. Cai, “When gaussian process meets big data: A review of scalable gps,” IEEE Transactions on Neural Networks and Learning Systems, 2020.
- J. Quiñonero-Candela and C. E. Rasmussen, “A unifying view of sparse approximate gaussian process regression,” Journal of Machine Learning Research, vol. 6, no. Dec, pp. 1939–1959, 2005.
- E. Snelson and Z. Ghahramani, “Sparse gaussian processes using pseudo-inputs,” in Advances in Neural Information Processing Systems (NIPS-2016), 2006.
- E. Snelson and Z. Ghahramani, “Local and global sparse gaussian process approximations,” in Artificial Intelligence and Statistics (AISTATS-2007), pp. 524–531, 2007.
- J. Hensman, N. Fusi, and N. D. Lawrence, “Gaussian processes for big data,” in Conference on Uncertainty in Artificial Intelligence (UAI-2013), pp. 282–290, 2013.
- M. K. Titsias, “Variational inference for gaussian and determinantal point processes,” in Workshop on Advances in Variational Inference, 2014.
- A. G. d. G. Matthews, Scalable Gaussian process inference using variational methods. PhD thesis, University of Cambridge, 2017.
- D. J. Rezende, S. Mohamed, and D. Wierstra, “Stochastic backpropagation and approximate inference in deep generative models,” in International Conference on Machine Learning (ICML2014), pp. 1278–1286, 2014.
- D. P. Kingma, T. Salimans, and M. Welling, “Variational dropout and the local reparameterization trick,” in Advances in Neural Information Processing Systems (NIPS-2015), pp. 2575–2583, 2015.
- C. K. I. Williams and M. W. Seeger, “Using the nyström method to speed up kernel machines,” in Advances in Neural Information Processing Systems (NIPS-2000), pp. 682–688, 2000.
- M. W. Seeger, C. K. I. Williams, and N. D. Lawrence, “Fast forward selection to speed up sparse gaussian process regression,” in International Workshop on Artificial Intelligence and Statistics (AISTATS-2003), 2003.
- D. Burt, C. E. Rasmussen, and M. Van Der Wilk, “Rates of convergence for sparse variational gaussian process regression,” in International Conference on Machine Learning, pp. 862–871, 2019.
- M. K. Titsias and N. D. Lawrence, “Bayesian gaussian process latent variable model,” in International Conference on Artificial Intelligence and Statistics, (AISTATS-2010), pp. 844–851, 2010.
- J. Zhou, G. Cui, Z. Zhang, C. Yang, Z. Liu, L. Wang, C. Li, and M. Sun, “Graph neural networks: A review of methods and applications,” arXiv preprint arXiv:1812.08434, 2018.
- Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and S. Y. Philip, “A comprehensive survey on graph neural networks,” IEEE Transactions on Neural Networks and Learning Systems, 2020.
- H. Rue and L. Held, Gaussian Markov random fields: theory and applications. CRC press, 2005.
- P. Sidén and F. Lindsten, “Deep gaussian markov random fields,” arXiv preprint arXiv:2002.07467, 2020.
- A. G. d. G. Matthews, M. van der Wilk, T. Nickson, K. Fujii, A. Boukouvalas, P. León-Villagrá, Z. Ghahramani, and J. Hensman, “GPflow: A Gaussian process library using TensorFlow,” Journal of Machine Learning Research, vol. 18, pp. 1–6, apr 2017.
- “Swedish meteorological and hydrological institute.” http://opendata-download-metobs.smhi.se/. Last accessed:2020-05-27.
- H. Behjat, U. Richter, D. Van De Ville, and L. Sörnmo, “Signal-adapted tight frames on graphs,” IEEE Transactions on Signal Processing, vol. 64, no. 22, pp. 6017–6029, 2016.
- H. Van Dop, G. Graziani, and W. Klug, “ETEX: A european tracer experiment,” in Large Scale Computations in Air Pollution Modelling, pp. 137–150, Springer, 1999.
- H. Jagadish, J. Gehrke, A. Labrinidis, Y. Papakonstantinou, J. M. Patel, R. Ramakrishnan, and C. Shahabi, “Big data and its technical challenges,” Communications of the ACM, vol. 57, no. 7, pp. 86–94, 2014.
- Y. Li, R. Yu, C. Shahabi, and Y. Liu, “Diffusion convolutional recurrent neural network: Data-driven traffic forecasting,” in International Conference on Learning Representations (ICLR-2018), 2018.
- J. D. Hamilton, Time series analysis, vol.
- 2. Princeton New Jersey, 1994.
- [50] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” in Advances in Neural Information Processing Systems (NIPS-2014), pp. 3104–3112, 2014.
- [51] D. Chai, L. Wang, and Q. Yang, “Bike flow prediction with multi-graph convolutional networks,” in International Conference on Advances in Geographic Information Systems, pp. 397–400, 2018.

Tags

Comments