AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
In this paper we propose a method to learn the mappings between input and output signals over graphs, which we call Stochastic Deep Gaussian Processes over Graphs

Stochastic Deep Gaussian Processes over Graphs

NIPS 2020, (2020)

Cited by: 0|Views34
EI
Full Text
Bibtex
Weibo

Abstract

In this paper we propose Stochastic Deep Gaussian Processes over Graphs (DGPG), which are deep Gaussian models that learn the mappings between input and output signals in graph domains. The approximate posterior distributions of the latent variables are derived with variational inference, and the evidence lower bound is evaluated and opti...More
0
Introduction
  • Gaussian processes (GPs) [1] are a favourable choice in the machine learning arsenal, due to their distinguishing advantages in modeling uncertainties, the ability of introducing expert knowledge through the flexible kernel design, and the data efficient property which accounts for their success in small and medium datasets.
  • A series of works about deep GPs were presented [7,8,9], which overcome the aforementioned disadvantages
  • All these methods can find their roots in the seminal paper [10], which proposes to use a set of inducing points whose size is manageable to summarize all the information in the original large dataset - the inducing points and their latent function values can be inferred by variational inference.
  • Lots of work emerged to assuage the complexity burden [22,23,24,25,26], among which the sparse Gaussian processes stand out as some of the most successful and popular methods [27,28,29,30,31,32]
Highlights
  • Gaussian processes (GPs) [1] are a favourable choice in the machine learning arsenal, due to their distinguishing advantages in modeling uncertainties, the ability of introducing expert knowledge through the flexible kernel design, and the data efficient property which accounts for their success in small and medium datasets
  • In this paper we propose Stochastic Deep Gaussian Processes over Graphs (DGPG), which is a method for modeling the relations between input and output signals over graph-structured domains
  • We summarize the main contributions of this paper as follows: 1) We propose a novel Bayesian non-parametric method called Stochastic Deep Gaussian Processes over Graphs (DGPG), to model the relations of input/output signals over graphs; 2) It is rigorously proved that under some technical assumptions, the sampling variances of DGPG are strictly less that of [11], implying that DGPG achieves faster convergence by considering graph information; 3) We performed experiments on a synthetic dataset
  • We show that our method outperforms a recent GP-based graph learning algorithm, and is competitive to a state-of-the-art DNN method on the challenging task of traffic flow prediction; 5) We show that DGPG possesses several other desirable characteristics: it can model uncertainties with high accuracy, and the automatic relevance determination (ARD) kernel allows it to learn which neighbors and features are of greater importance for the prediction
  • In this paper we propose a method to learn the mappings between input and output signals over graphs, which we call Stochastic Deep Gaussian Processes over Graphs (DGPG)
  • DGPG exhibits several appealing characteristics, such as the ability to accurately model uncertainties and to automatically discover which vertices and features are relevant to the prediction
Results
  • Results in Table

    5 show that: 1) DGPG is competitive w.r.t. the state-of-the-art method DCRNN, and outperforms it in most cases in the more challenging dataset LA; 2) DGPG produces accurate predictions for different datasets and forecasting horizons, showing its stability and consistency; 3) DGPG achieves its best performance with the appropriate layers, demonstrating that it benefits from deep structures; 4) the performance can be improved by utilizing the validation data during training.
  • Variance Analysis A distinguishing advantage of GPs is that they are capable of modeling the predictive variances.
  • Inspired by [51], the authors exam how many test instances fall in the predictive confidence interval.
  • The authors can expect that the potion of the test instances falling in the predictive intervals should have the same ratio.
  • Table 4 shows the experimental results.
  • The authors can see that the numerical results comply with the analysis reasonably well, within the ±2σ intervals
Conclusion
  • In this paper the authors propose a method to learn the mappings between input and output signals over graphs, which the authors call Stochastic Deep Gaussian Processes over Graphs (DGPG).
  • The authors conducted thorough experiments in both synthetic and realistic datasets.
  • Numerical results on a synthetic dataset validate the theoretical assumptions and analysis, the claim that the method has smaller sampling variances and converges faster.
  • The authors show that the method generally outperforms other baselines in small datasets, and is competitive to the state-of-the-art method in the challenging task of traffic flow prediction.
  • DGPG exhibits several appealing characteristics, such as the ability to accurately model uncertainties and to automatically discover which vertices and features are relevant to the prediction
Summary
  • Introduction:

    Gaussian processes (GPs) [1] are a favourable choice in the machine learning arsenal, due to their distinguishing advantages in modeling uncertainties, the ability of introducing expert knowledge through the flexible kernel design, and the data efficient property which accounts for their success in small and medium datasets.
  • A series of works about deep GPs were presented [7,8,9], which overcome the aforementioned disadvantages
  • All these methods can find their roots in the seminal paper [10], which proposes to use a set of inducing points whose size is manageable to summarize all the information in the original large dataset - the inducing points and their latent function values can be inferred by variational inference.
  • Lots of work emerged to assuage the complexity burden [22,23,24,25,26], among which the sparse Gaussian processes stand out as some of the most successful and popular methods [27,28,29,30,31,32]
  • Objectives:

    The authors' goals are different and the approaches share little similarity besides the minor points mentioned above.
  • Results:

    Results in Table

    5 show that: 1) DGPG is competitive w.r.t. the state-of-the-art method DCRNN, and outperforms it in most cases in the more challenging dataset LA; 2) DGPG produces accurate predictions for different datasets and forecasting horizons, showing its stability and consistency; 3) DGPG achieves its best performance with the appropriate layers, demonstrating that it benefits from deep structures; 4) the performance can be improved by utilizing the validation data during training.
  • Variance Analysis A distinguishing advantage of GPs is that they are capable of modeling the predictive variances.
  • Inspired by [51], the authors exam how many test instances fall in the predictive confidence interval.
  • The authors can expect that the potion of the test instances falling in the predictive intervals should have the same ratio.
  • Table 4 shows the experimental results.
  • The authors can see that the numerical results comply with the analysis reasonably well, within the ±2σ intervals
  • Conclusion:

    In this paper the authors propose a method to learn the mappings between input and output signals over graphs, which the authors call Stochastic Deep Gaussian Processes over Graphs (DGPG).
  • The authors conducted thorough experiments in both synthetic and realistic datasets.
  • Numerical results on a synthetic dataset validate the theoretical assumptions and analysis, the claim that the method has smaller sampling variances and converges faster.
  • The authors show that the method generally outperforms other baselines in small datasets, and is competitive to the state-of-the-art method in the challenging task of traffic flow prediction.
  • DGPG exhibits several appealing characteristics, such as the ability to accurately model uncertainties and to automatically discover which vertices and features are relevant to the prediction
Tables
  • Table1: Several statistics that are of interest for the theoretical analysis
  • Table2: Dataset description of training/test instances, number of nodes, and average decal datasets and training/test splitting. Description of the gree. #Iter. is the steps of iteration
  • Table3: GP-L and GPG-L are baselines in [<a class="ref-link" id="c20" href="#r20">20</a>]. SVR denotes support vector regression. For SVR the output is a function of its neighbors’ input and high-order graph information is lost. Metrics of GCGP [<a class="ref-link" id="c19" href="#r19">19</a>] are calculated using the recomposed prediction result. We report the results of DGPG using linear kernel (L), RBF kernel (RBF), Matérn32 kernel (M32), and the optimal layer. Terms with underline denote best results. Results of several other baselines are in the supplementary materials
  • Table4: Variance Analysis of DGPG
  • Table5: Comparison on the task of traffic flow prediction. Results of other baselines are obtained from [<a class="ref-link" id="c48" href="#r48">48</a>]
Download tables as Excel
Related work
  • To assuage the O(N 3) training complexity of standard GPs, several sparse approximation methods have been proposed [35, 36, 27,28,29, 31, 32], the ideas of which are to construct a manageable set of inducing points to summarize the information of the original dataset. It is rigorously established in [37] that under some technical assumptions sparse approximation methods can produce reliable results with M = O(logD N ) inducing points, where D is the dimension of the data. Inspired by some of them, [10] introduces a method to select the inducing points by using variational inference. A few years later this work has become the foundation of a series of deep Gaussian process models: inspired by [10] and the GP latent variable model (GP-LVM) [38], Damianou and Lawrence proposed deep GPs [7], which are composed of hierarchical Gaussian process mappings; several other deep GP models were lately proposed [8, 9]. All these models demonstrate their superior performances over a variety of challenging tasks. The work above derives analytically tractable evidence lower bounds, while in contrast with them in [11] - which is the most related paper of our work - the authors present a model with milder independent assumption and utilize the sampling technique. Applying the sampling technique not only allows the model to use both Gaussian (e.g. in regression) and non-Gaussian (e.g. in classification) likelihood in an uniform manner, but also enables it to take advantage of GPU acceleration. These strengths are also inherited by our proposal.
Funding
  • Acknowledgments and Disclosure of Funding This work is supported in part by the National Key Research and Development Program of China under Grant 2018YFB1800204, the National Natural Science Foundation of China under Grant 61771273, the R&D Program of Shenzhen under Grant JCYJ20180508152204044, and the project “PCL Future Greater-Bay Area Network Facilities for Large-scale Experiments and Applications (LZC0019)”
Study subjects and analysis
randomly selected parents: 7
The goal of this example is to support our theoretical analysis with numerical evidences, e.g. verifying the positive definite assumption and the claim that graph can reduce sampling variances and lead to faster convergence. Consider a symmetric graph with 500 nodes, and each node is connected to approximately 7 randomly selected parents (including a self-connecting edge). We sample each input signal xi ∈ R500 from a standard multivariate normal distribution with an isometric covariance matrix x ∼ N (0, I)

small datasets: 3
Data efficiency is an appealing property of GPs. In this experiment we test the performance of DGPG on three small datasets: Weather [44], fMRI [45] and ETEX [46]. To ensure that our comparison is fair, we use the identidatasets is presented in Table 2

Reference
  • C. Rasmussen and C. Williams, Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning, MIT Press, Jan. 2006.
    Google ScholarLocate open access versionFindings
  • A. B. Chan, Z.-S. J. Liang, and N. Vasconcelos, “Privacy preserving crowd monitoring: Counting people without people models or tracking,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR-2008), 2008.
    Google ScholarLocate open access versionFindings
  • J. Snoek, H. Larochelle, and R. P. Adams, “Practical bayesian optimization of machine learning algorithms,” in Advances in Neural Information Processing Systems (NIPS-2012), pp. 2951– 2959, 2012.
    Google ScholarLocate open access versionFindings
  • H. Liu, Y.-S. Ong, and J. Cai, “A survey of adaptive sampling for global metamodeling in support of simulation-based complex engineering design,” Structural and Multidisciplinary Optimization, vol. 57, no. 1, pp. 393–416, 2018.
    Google ScholarLocate open access versionFindings
  • E. V. Bonilla, K. M. Chai, and C. Williams, “Multi-task gaussian process prediction,” in Advances in Neural Information Processing Systems (NIPS-2008), pp. 153–160, 2008.
    Google ScholarLocate open access versionFindings
  • M. Kuss and C. E. Rasmussen, “Gaussian processes in reinforcement learning,” in Advances in Neural Information Processing Systems (NIPS-2004), pp. 751–758, 2004.
    Google ScholarLocate open access versionFindings
  • A. C. Damianou and N. D. Lawrence, “Deep gaussian processes,” in International Conference on Artificial Intelligence and Statistics (AISTATS-2013), pp. 207–215, 2013.
    Google ScholarLocate open access versionFindings
  • Z. Dai, A. C. Damianou, J. González, and N. D. Lawrence, “Variational auto-encoded deep gaussian processes,” in International Conference on Learning Representations (ICLR-2016), 2016.
    Google ScholarLocate open access versionFindings
  • C. L. C. Mattos, Z. Dai, A. C. Damianou, J. Forth, G. A. Barreto, and N. D. Lawrence, “Recurrent gaussian processes,” in International Conference on Learning Representations (ICLR-2016), 2016.
    Google ScholarLocate open access versionFindings
  • M. Titsias, “Variational learning of inducing variables in sparse gaussian processes,” in Artificial Intelligence and Statistics (AISTATS-2009), pp. 567–574, 2009.
    Google ScholarLocate open access versionFindings
  • H. Salimbeni and M. Deisenroth, “Doubly stochastic variational inference for deep gaussian processes,” in Advances in Neural Information Processing Systems (NIPS-2017), pp. 4588–4599, 2017.
    Google ScholarLocate open access versionFindings
  • J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun, “Spectral networks and locally connected networks on graphs,” in International Conference on Learning Representations (ICLR-2014), 2014.
    Google ScholarLocate open access versionFindings
  • D. K. Duvenaud, D. Maclaurin, J. Iparraguirre, R. Bombarell, T. Hirzel, A. Aspuru-Guzik, and R. P. Adams, “Convolutional networks on graphs for learning molecular fingerprints,” in Advances in Neural Information Processing Systems (NIPS-2015), pp. 2224–2232, 2015.
    Google ScholarLocate open access versionFindings
  • J. Atwood and D. Towsley, “Diffusion-convolutional neural networks,” in Advances in Neural Information Processing Systems (NIPS-2016), pp. 1993–2001, 2016.
    Google ScholarLocate open access versionFindings
  • M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neural networks on graphs with fast localized spectral filtering,” in Advances in Neural Information Processing Systems (NIPS-2016), pp. 3844–3852, 2016.
    Google ScholarLocate open access versionFindings
  • T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” in International Conference on Learning Representations (ICLR-2017), 2017.
    Google ScholarLocate open access versionFindings
  • W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” in Advances in Neural Information Processing Systems (NIPS-2017), pp. 1024–1034, 2017.
    Google ScholarLocate open access versionFindings
  • Y. C. Ng, N. Colombo, and R. Silva, “Bayesian semi-supervised learning with graph gaussian processes,” in Advances in Neural Information Processing Systems (NIPS-2018), pp. 1690–1701, 2018.
    Google ScholarLocate open access versionFindings
  • I. Walker and B. Glocker, “Graph convolutional gaussian processes,” in International Conference on Machine Learning (ICML-2019), pp. 6495–6504, 2019.
    Google ScholarLocate open access versionFindings
  • A. Venkitaraman, S. Chatterjee, and P. Handel, “Gaussian processes over graphs,” in International Conference on Acoustics, Speech and Signal Processing (ICASSP-2020), pp. 5640–5644, IEEE, 2020.
    Google ScholarLocate open access versionFindings
  • C. M. Bishop, Pattern Recognition and Machine Learning. Springer, 2006.
    Google ScholarFindings
  • V. Tresp, “A bayesian committee machine,” Neural Computation, vol. 12, no. 11, pp. 2719–2741, 2000.
    Google ScholarLocate open access versionFindings
  • Y. Cao and D. J. Fleet, “Generalized product of experts for automatic and principled fusion of gaussian process predictions,” arXiv preprint arXiv:1410.7827, 2014.
    Findings
  • M. Deisenroth and J. W. Ng, “Distributed gaussian processes,” in International Conference on Machine Learning (ICML-2015), pp. 1481–1490, 2015.
    Google ScholarLocate open access versionFindings
  • H. Liu, J. Cai, Y. Wang, and Y. S. Ong, “Generalized robust bayesian committee machine for large-scale gaussian process regression,” in International Conference on Machine Learning (ICML-2018), 2018.
    Google ScholarLocate open access versionFindings
  • H. Liu, Y.-S. Ong, X. Shen, and J. Cai, “When gaussian process meets big data: A review of scalable gps,” IEEE Transactions on Neural Networks and Learning Systems, 2020.
    Google ScholarLocate open access versionFindings
  • J. Quiñonero-Candela and C. E. Rasmussen, “A unifying view of sparse approximate gaussian process regression,” Journal of Machine Learning Research, vol. 6, no. Dec, pp. 1939–1959, 2005.
    Google ScholarLocate open access versionFindings
  • E. Snelson and Z. Ghahramani, “Sparse gaussian processes using pseudo-inputs,” in Advances in Neural Information Processing Systems (NIPS-2016), 2006.
    Google ScholarLocate open access versionFindings
  • E. Snelson and Z. Ghahramani, “Local and global sparse gaussian process approximations,” in Artificial Intelligence and Statistics (AISTATS-2007), pp. 524–531, 2007.
    Google ScholarLocate open access versionFindings
  • J. Hensman, N. Fusi, and N. D. Lawrence, “Gaussian processes for big data,” in Conference on Uncertainty in Artificial Intelligence (UAI-2013), pp. 282–290, 2013.
    Google ScholarLocate open access versionFindings
  • M. K. Titsias, “Variational inference for gaussian and determinantal point processes,” in Workshop on Advances in Variational Inference, 2014.
    Google ScholarLocate open access versionFindings
  • A. G. d. G. Matthews, Scalable Gaussian process inference using variational methods. PhD thesis, University of Cambridge, 2017.
    Google ScholarFindings
  • D. J. Rezende, S. Mohamed, and D. Wierstra, “Stochastic backpropagation and approximate inference in deep generative models,” in International Conference on Machine Learning (ICML2014), pp. 1278–1286, 2014.
    Google ScholarLocate open access versionFindings
  • D. P. Kingma, T. Salimans, and M. Welling, “Variational dropout and the local reparameterization trick,” in Advances in Neural Information Processing Systems (NIPS-2015), pp. 2575–2583, 2015.
    Google ScholarLocate open access versionFindings
  • C. K. I. Williams and M. W. Seeger, “Using the nyström method to speed up kernel machines,” in Advances in Neural Information Processing Systems (NIPS-2000), pp. 682–688, 2000.
    Google ScholarLocate open access versionFindings
  • M. W. Seeger, C. K. I. Williams, and N. D. Lawrence, “Fast forward selection to speed up sparse gaussian process regression,” in International Workshop on Artificial Intelligence and Statistics (AISTATS-2003), 2003.
    Google ScholarLocate open access versionFindings
  • D. Burt, C. E. Rasmussen, and M. Van Der Wilk, “Rates of convergence for sparse variational gaussian process regression,” in International Conference on Machine Learning, pp. 862–871, 2019.
    Google ScholarLocate open access versionFindings
  • M. K. Titsias and N. D. Lawrence, “Bayesian gaussian process latent variable model,” in International Conference on Artificial Intelligence and Statistics, (AISTATS-2010), pp. 844–851, 2010.
    Google ScholarLocate open access versionFindings
  • J. Zhou, G. Cui, Z. Zhang, C. Yang, Z. Liu, L. Wang, C. Li, and M. Sun, “Graph neural networks: A review of methods and applications,” arXiv preprint arXiv:1812.08434, 2018.
    Findings
  • Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and S. Y. Philip, “A comprehensive survey on graph neural networks,” IEEE Transactions on Neural Networks and Learning Systems, 2020.
    Google ScholarLocate open access versionFindings
  • H. Rue and L. Held, Gaussian Markov random fields: theory and applications. CRC press, 2005.
    Google ScholarFindings
  • P. Sidén and F. Lindsten, “Deep gaussian markov random fields,” arXiv preprint arXiv:2002.07467, 2020.
    Findings
  • A. G. d. G. Matthews, M. van der Wilk, T. Nickson, K. Fujii, A. Boukouvalas, P. León-Villagrá, Z. Ghahramani, and J. Hensman, “GPflow: A Gaussian process library using TensorFlow,” Journal of Machine Learning Research, vol. 18, pp. 1–6, apr 2017.
    Google ScholarLocate open access versionFindings
  • “Swedish meteorological and hydrological institute.” http://opendata-download-metobs.smhi.se/. Last accessed:2020-05-27.
    Findings
  • H. Behjat, U. Richter, D. Van De Ville, and L. Sörnmo, “Signal-adapted tight frames on graphs,” IEEE Transactions on Signal Processing, vol. 64, no. 22, pp. 6017–6029, 2016.
    Google ScholarLocate open access versionFindings
  • H. Van Dop, G. Graziani, and W. Klug, “ETEX: A european tracer experiment,” in Large Scale Computations in Air Pollution Modelling, pp. 137–150, Springer, 1999.
    Google ScholarLocate open access versionFindings
  • H. Jagadish, J. Gehrke, A. Labrinidis, Y. Papakonstantinou, J. M. Patel, R. Ramakrishnan, and C. Shahabi, “Big data and its technical challenges,” Communications of the ACM, vol. 57, no. 7, pp. 86–94, 2014.
    Google ScholarLocate open access versionFindings
  • Y. Li, R. Yu, C. Shahabi, and Y. Liu, “Diffusion convolutional recurrent neural network: Data-driven traffic forecasting,” in International Conference on Learning Representations (ICLR-2018), 2018.
    Google ScholarLocate open access versionFindings
  • J. D. Hamilton, Time series analysis, vol.
    Google ScholarLocate open access versionFindings
  • 2. Princeton New Jersey, 1994.
    Google ScholarFindings
  • [50] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” in Advances in Neural Information Processing Systems (NIPS-2014), pp. 3104–3112, 2014.
    Google ScholarLocate open access versionFindings
  • [51] D. Chai, L. Wang, and Q. Yang, “Bike flow prediction with multi-graph convolutional networks,” in International Conference on Advances in Geographic Information Systems, pp. 397–400, 2018.
    Google ScholarLocate open access versionFindings
Author
Naiqi Li
Naiqi Li
Wenjie Li
Wenjie Li
Jifeng Sun
Jifeng Sun
Yinghua Gao
Yinghua Gao
Your rating :
0

 

Tags
Comments
小科