AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We have proposed a crucial advance to the sparse spectrum Gaussian process framework to account for nonstationarity through a novel input warping formulation

Sparse Spectrum Warped Input Measures for Nonstationary Kernel Learning

NIPS 2020, (2020)

Cited by: 0|Views27
EI
Full Text
Bibtex
Weibo

Abstract

We establish a general form of explicit, input-dependent, measure-valued warpings for learning nonstationary kernels. While stationary kernels are ubiquitous and simple to use, they struggle to adapt to functions that vary in smoothness with respect to the input. The proposed learning algorithm warps inputs as conditional Gaussian measu...More

Code:

Data:

0
Introduction
  • Many interesting real world phenomena exhibit varying characteristics, such as smoothness, across their domain.
  • The typical kernel based learner canonically relies on a stationary kernel function, a measure of "similarity", to define the prior beliefs over the function space.
  • Such a kernel, cannot represent desirable nonstationary nuances, like varying spatial smoothness and sudden discontinuities.
  • One obvious way to alleviate the problem of finding the appropriate kernel function given one’s data is hyperparameter optimisation.
Highlights
  • Many interesting real world phenomena exhibit varying characteristics, such as smoothness, across their domain
  • In this paper we propose a method for nonstationary kernel learning, based on sparse spectral kernel representations
  • We have provided implementations for random fourier features kernel (RFFS), RFFNS, and sparse spectrum warped input measures (SSWIM)
  • We have proposed a crucial advance to the sparse spectrum Gaussian process framework to account for nonstationarity through a novel input warping formulation
  • We introduced a novel form of input warping analytically incorporating complete Gaussian measures in the functional warping with the concept of pseudo-training data and latent self-supervision
  • Our model suggests an interesting and effective inductive bias this is nicely interpreted as a learned conditional affine transformation
Methods
  • Method GP BWGP

    MLWGP3 MLWGP20 SSWIM1 SSWIM2 GP BWGP MLWGP3 MLWGP20 SSWIM1 SSWIM2 abalone 4.55 ± 0.14 4.55 ± 0.11 4.54 ± 0.10 4.59 ± 0.32 4.64 ± 0.13 4.50 ± 0.11 2.17 ± 0.01 1.99 ± 0.01 1.97 ± 0.02 1.99 ± 0.05 2.18 ± 0.01 2.17 ± 0.02 creep 584.9 ± 71.2 491.8 ± 36.2 502.3 ± 43.3 506.3 ± 46.1 483.69 ± 64.12 279.86 ± 31.88 4.46 ± 0.03 4.31 ± 0.04 4.21 ± 0.03 4.21 ± 0.08 4.45 ± 0.03 4.27 ± 0.03 ailerons 2.95 ± 0.16 2.91 ± 0.14 2.80 ± 0.11 3.42 ± 2.87 2.96 ± 0.08 2.83 ± 0.06 -7.30 ± 0.01 -7.38 ± 0.02 -7.44 ± 0.01 -7.45 ± 0.08 -7.24 ± 0.01 -7.00 ± 0.02

    24 C Additional Experiments

    C.1 Increasing number of pseudo-training points For the "increasing number of pseudo-training points" experiment we used 1 layer of warping with 256 features for both the warping and top-level predictive functions.

    28 C.2 Increasing warping depth 29 The authors used 256 features and 1280 pseudo-training points for all of the experiments.

    C.3 Complete real-dataset experiments table Table 2 contains additional real-world experiments to extend the majore experimental results from the main paper.

    (18, 8751) elevators (5, 1503) airfoil

    C.4 Extended discussion It is imperative to note here the aim is not to demand any algorithmic dominance when comparing methods.
  • The authors ran with 256 features, 1280 pseudo-training points, for 150 steps, 45 with 10 repeats, and evaluated the test RMSE and test MNLP on the test set for every single epoch of optimisation.
  • Other loss functions and training schemes, such as leave56 one-out cross validation
  • These results corroborate long known discussions from [?] about the risk of overfitting from trusting the marginal likelihood with standard optimisation procedures, their importance seems to have been largely ignored in evaluation of recent methodology innovations in the GP literature.
  • The authors believe that a more open discussion should be on the table for analysing the interplay between model expressiveness and the effect this has on overfitting; this is especially pertinent to the GP literature which has placed a large emphasis on the importance of the marginal likelihood has a valid hyperparameter optimisation loss
Conclusion
  • The authors have proposed a crucial advance to the sparse spectrum Gaussian process framework to account for nonstationarity through a novel input warping formulation.
  • The authors' model suggests an interesting and effective inductive bias this is nicely interpreted as a learned conditional affine transformation.
  • This perspectives invites a fresh take on how the authors can discover more effective representations of nonstationary data
Tables
  • Table1: RMSE and MNLP metrics for various real world datasets. MSE and MNLP metrics for comparison with Warped and Bayesian Warped GPs [?]. MSE results for ailerons are ×10−8
Download tables as Excel
Related work
  • Foundational work [26, 27] on kernel based nonstationarity necessitated manipulation of the kernel function with expensive inference procedures. Recent spectral representation of kernel functions have emerged with Bochner’s theorem [9]. In this paradigm, one constructs kernels in the Fourier domain via random Fourier features (RFFs) [10, 11] and extensions for nonstationarity via the generalised Fourier inverse transform [28, 23, 2, 29]. While general, these methods suffer from various drawbacks such as expensive computations and overfitting due to over-parameterised models [2]. More expressive modelling frameworks [30, 31, 32, 33] have played a major role in expanding the efficacy of kernel based learning. Perhaps the most well known in the recent literature is Deep Kernel Learning Wilson et al [22] and the deep Gaussian process [34] and heretofore its various extensions [25, 35, 36]. While functionally elegant, methods like DKL and DGP often rely on increasing the complexity of the composition to produce expressiveness and are often unsuitable or unwieldy in practice occasionally resulting in performance worse than stationary inducing point GPs [25]. We remark a notable difference between DGP and SSWIM is one should interpret our pseudo-training points as hyperparameters of the kernel as opposed to parameters of a variational approximation. Simple bijective input warpings were considered in [37] for transforming nonstationary functions into more well behaved functions. In [38] the authors augment the standard GP model by learning nonstationary data dependent functions for the hyperparameters of a nonstationary squared exponential kernel [39] however is limited to low dimensions. More recently, the work of [40] has explored a dynamical systems view of input warpings by processing the inputs through a time dependent differential fields. Less related models presented in Wang and Neal [41], Dutordoir et al [42], Snelson et al [43] involve output warping non-Gaussian likelihoods and heteroscedastic noise. For the curious reader we examine contrasting properties of output and input warping in the supplementary material.
Reference
  • Andrew Y Ng, Adam Coates, Mark Diel, Varun Ganapathi, Jamie Schulte, Ben Tse, Eric Berger, and Eric Liang. Autonomous inverted helicopter flight via reinforcement learning. In Experimental robotics IX. Springer, 2006.
    Google ScholarFindings
  • Jean-Francois Ton, Seth Flaxman, Dino Sejdinovic, and Samir Bhatt. Spatial mapping with Gaussian processes and nonstationary fourier features. Spatial statistics, 2018.
    Google ScholarLocate open access versionFindings
  • Nir Friedman, Michal Linial, Iftach Nachman, and Dana Pe’er. Using Bayesian networks to analyze expression data. Journal of computational biology, 2000.
    Google ScholarLocate open access versionFindings
  • Ruben Martinez-Cantin. Bayesian optimization with adaptive kernels for robot control. In 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2017.
    Google ScholarLocate open access versionFindings
  • Carl Edward Rasmussen. Gaussian processes in machine learning. In Advanced lectures on machine learning. Springer, 2004.
    Google ScholarFindings
  • H. Bauer. Probability theory and elements of measure theory. Probability and mathematical statistics. Academic Press, 1981.
    Google ScholarFindings
  • Miguel Lázaro-Gredilla, Joaquin Quiñonero-Candela, Carl Edward Rasmussen, and Aníbal R FigueirasVidal. Sparse spectrum Gaussian process regression. Journal of Machine Learning Research (JMLR), 2010.
    Google ScholarLocate open access versionFindings
  • Yunpeng Pan, Xinyan Yan, Evangelos A. Theodorou, and Byron Boots. Prediction under uncertainty in sparse spectrum Gaussian processes with applications to filtering and control. In Proceedings of the 34th International Conference on Machine Learning (ICML), volume 70 of Proceedings of Machine Learning Research, 2017.
    Google ScholarLocate open access versionFindings
  • Salomon Bochner. Vorlesungen über Fouriersche Integrale: von S. Bochner. Akad. Verl.-Ges., 1932.
    Google ScholarLocate open access versionFindings
  • A. Rahimi and B. Recht. Random features for large-scale kernel machines. In Neural Information Processing Systems (NIPS), 2007.
    Google ScholarLocate open access versionFindings
  • A. Rahimi and B. Recht. Weighted sums of random kitchen sinks: Replacing minimization with randomisation in learning. In Neural Information Processing Systems (NIPS), 2008.
    Google ScholarLocate open access versionFindings
  • C Bishop. Pattern recognition and machine learning (information science and statistics), 1st edn. 2006. corr. 2nd printing edn. Springer, New York, 2007.
    Google ScholarFindings
  • Mauricio A Alvarez, Lorenzo Rosasco, and Neil D Lawrence. Kernels for Vector-Valued Functions: a Review. Technical report, MIT - Computer Science and Artificial Intelligence Laboratory, 2011.
    Google ScholarFindings
  • Carl Jidling, Niklas Wahlström, Adrian Wills, and Thomas B. Schön. Linearly constrained Gaussian processes. In Advances in Neural Information Processing Systems (NIPS), 2017.
    Google ScholarLocate open access versionFindings
  • Rafael Oliveira, Lionel Ott, and Fabio Ramos. Bayesian optimisation under uncertain inputs. In International Conference on Artificial Intelligence and Statistics (AISTATS), Naha, Okinawa, Japan, 2019.
    Google ScholarLocate open access versionFindings
  • Michalis Titsias. Variational learning of inducing variables in sparse Gaussian processes. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2009.
    Google ScholarLocate open access versionFindings
  • H. Neudecker, S. Liu, and W. Polasek. The Hadamard product and some of its applications in statistics. Statistics, 26(4):365–373, 1995.
    Google ScholarLocate open access versionFindings
  • Rafael González and Richard Woods. Digital image processing. isbn: 9780131687288. Prentice Hall, 2008.
    Google ScholarFindings
  • Dheeru Dua and Casey Graff. UCI machine learning repository. http://archive.ics.uci.edu/ml, 2017.
    Findings
  • Luís Torgo. Regression datasets. "https://www.dcc.fc.up.pt/~ltorgo/Regression/DataSets.html.
    Findings
  • D Cole, C Martin-Moran, AG Sheard, HKDH Bhadeshia, and DJC MacKay. Modelling creep rupture strength of ferritic steel welds. Science and Technology of Welding and Joining, 2000.
    Google ScholarLocate open access versionFindings
  • Andrew Gordon Wilson, Zhiting Hu, Ruslan Salakhutdinov, and Eric P Xing. Deep kernel learning. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2016.
    Google ScholarLocate open access versionFindings
  • Sami Remes, Markus Heinonen, and Samuel Kaski. Non-stationary spectral kernels. In Advances in Neural Information Processing Systems (NIPS), 2017.
    Google ScholarLocate open access versionFindings
  • James Hensman, Alexander G. de G. Matthews, and Zoubin Ghahramani. Scalable variational Gaussian process classification. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2015.
    Google ScholarLocate open access versionFindings
  • Hugh Salimbeni and Marc Deisenroth. Doubly stochastic variational inference for deep Gaussian processes. In Advances in Neural Information Processing Systems (NIPS), 2017.
    Google ScholarLocate open access versionFindings
  • Dave Higdon, Jenise Swall, and J Kern. Non-stationary spatial modeling. Bayesian statistics, 1999.
    Google ScholarLocate open access versionFindings
  • Christopher J Paciorek and Mark J Schervish. Nonstationary covariance functions for Gaussian process regression. In Advances in Neural Information Processing Systems (NIPS), 2004.
    Google ScholarLocate open access versionFindings
  • Yves-Laurent Kom Samo and Stephen Roberts. Generalized spectral kernels. arXiv preprint arXiv:1506.02236, 2015.
    Findings
  • Shengyang Sun, Guodong Zhang, Chaoqi Wang, Wenyuan Zeng, Jiaman Li, and Roger Grosse. Differentiable compositional kernel learning for Gaussian processes. In International Conference on Machine Learning (ICML), 2018.
    Google ScholarLocate open access versionFindings
  • Roberto Calandra, Jan Peters, Carl Edward Rasmussen, and Marc Peter Deisenroth. Manifold Gaussian processes for regression. In 2016 International Joint Conference on Neural Networks (IJCNN). IEEE, 2016.
    Google ScholarLocate open access versionFindings
  • Andrew Gordon Wilson, David A Knowles, and Zoubin Ghahramani. Gaussian process regression networks. In Proceedings of the 29th International Coference on International Conference on Machine Learning, pages 1139–1146, 2012.
    Google ScholarLocate open access versionFindings
  • Paul D Sampson and Peter Guttorp. Nonparametric estimation of nonstationary spatial covariance structure. Journal of the American Statistical Association, 1992.
    Google ScholarLocate open access versionFindings
  • Ethan B Anderes, Michael L Stein, et al. Estimating deformations of isotropic Gaussian random fields on the plane. The Annals of Statistics, 2008.
    Google ScholarLocate open access versionFindings
  • Andreas Damianou and Neil Lawrence. Deep Gaussian processes. In International Conference on Artificial Intelligence and Statistics (AISTATS), pages 207–215, 2013.
    Google ScholarLocate open access versionFindings
  • Kurt Cutajar, Edwin V Bonilla, Pietro Michiardi, and Maurizio Filippone. Random feature expansions for deep Gaussian processes. In International Conference on Machine Learning (ICML), 2017.
    Google ScholarLocate open access versionFindings
  • Thang Bui, Daniel Hernández-Lobato, Jose Hernandez-Lobato, Yingzhen Li, and Richard Turner. Deep Gaussian processes for regression using approximate expectation propagation. In International Conference on Machine Learning (ICML), 2016.
    Google ScholarLocate open access versionFindings
  • Jasper Snoek, Kevin Swersky, Rich Zemel, and Ryan Adams. Input warping for Bayesian optimization of non-stationary functions. In International Conference on Machine Learning (ICML), 2014.
    Google ScholarLocate open access versionFindings
  • Markus Heinonen, Henrik Mannerström, Juho Rousu, Samuel Kaski, and Harri Lähdesmäki. Nonstationary Gaussian process regression with hamiltonian monte carlo. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2016.
    Google ScholarLocate open access versionFindings
  • M. N. Gibbs. Bayesian Gaussian processes for regression and classification. Ph. D. Thesis, Department of Physics, University of Cambridge, 1997.
    Google ScholarFindings
  • Pashupati Hegde, Markus Heinonen, Harri Lähdesmäki, and Samuel Kaski. Deep learning with differential Gaussian process flows. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2019.
    Google ScholarLocate open access versionFindings
  • Chunyi Wang and Radford M. Neal. Gaussian Process Regression with Heteroscedastic or Non-Gaussian Residuals. Technical report, University of Toronto, Toronto, Canada, 2012. URL http://arxiv.org/abs/1212.6246.
    Findings
  • Vincent Dutordoir, Hugh Salimbeni, James Hensman, and Marc Deisenroth. Gaussian Process Conditional Density Estimation. In S Bengio, H Wallach, H Larochelle, K Grauman, N Cesa-Bianchi, and R Garnett, editors, Advances in Neural Information Processing Systems 31, pages 2385–2395. Curran Associates, Inc., 2018.
    Google ScholarLocate open access versionFindings
  • Edward Snelson, Zoubin Ghahramani, and Carl E Rasmussen. Warped Gaussian processes. In Advances in Neural Information Processing Systems (NIPS), 2004.
    Google ScholarLocate open access versionFindings
  • Ransalu Senanayake, Simon O’Callaghan, and Fabio Ramos. Predicting spatio-temporal propagation of seasonal influenza using variational gaussian process regression. In Thirtieth AAAI Conference on Artificial Intelligence, 2016.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科