AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
With regard to machine learning this paper shows a correspondence between Gaussian processes and certain neural networks and raises the question of how best to perform nonlinear regression with uncertainty estimates

Neuronal Gaussian Process Regression

NIPS 2020, (2020)

Cited by: 0|Views11
EI
Full Text
Bibtex
Weibo

Abstract

The brain takes uncertainty intrinsic to our world into account. For example, associating spatial locations with rewards requires to predict not only expected reward at new spatial locations but also its uncertainty to avoid catastrophic events and forage safely. A powerful and flexible framework for nonlinear regression that takes uncert...More

Code:

Data:

0
Introduction
  • Predictive processing represents one of the fundamental principles of neural computations [1].
  • In the motor domain the brain employs predictive forward models [2], and a fundamental aspect of learned behavior is the ability to form associations between predictive environmental events and rewarding outcomes.
  • These are just two examples of the general task of regression, to predict a dependent target variable given explanatory input variable(s), that the brain has to solve.
  • The covariance function k(x, x ) depends on hyperparameters, which are usually learned by maximizing the log marginal likelihood
Highlights
  • Predictive processing represents one of the fundamental principles of neural computations [1]
  • A standard regression model assumes yi = f (xi) + i, where f is an unknown latent function that is corrupted by Gaussian observation noise i ∼ N (0, σ2)
  • This paper introduces a biologically plausible implementation of Gaussian processes
  • With regard to machine learning this paper shows a correspondence between Gaussian processes and certain neural networks and raises the question of how best to perform nonlinear regression with uncertainty estimates
Methods
  • Methods that have an exactly diagonal

    Kuu have been proposed [35], but these rely on spectral inter-domain features [36].
  • If σ is small or n large one can neglect the noise term entirely.
  • It is unclear to the author how these weights can be learned in a biologically plausible manner, one can approximate them.
  • The second term in Eq (9) is approximately zero and can be neglected compared to the first term, because σ−2kf j kf j O(s2 ns2 σ2.
  • One can approximate Kuu by its diagonal s2I, yielding weights U = s−1I that are constant, so no plasticity is necessary.
Conclusion
  • The author has introduced a biologically plausible Gaussian process approximation with good predictive performance and close approximation of the full Gaussian process.
  • With regard to neuroscience the paper introduces a biologically plausible Gaussian process approximation with good predictive performance and close approximation of the full Gaussian process, compared to VFE and FITC.
  • It yields initial results in line with existing experimental data and motivates new experiments for a more direct test of the model.
  • Ethical aspects and future societal consequences do not apply to this work
Summary
  • Introduction:

    Predictive processing represents one of the fundamental principles of neural computations [1].
  • In the motor domain the brain employs predictive forward models [2], and a fundamental aspect of learned behavior is the ability to form associations between predictive environmental events and rewarding outcomes.
  • These are just two examples of the general task of regression, to predict a dependent target variable given explanatory input variable(s), that the brain has to solve.
  • The covariance function k(x, x ) depends on hyperparameters, which are usually learned by maximizing the log marginal likelihood
  • Methods:

    Methods that have an exactly diagonal

    Kuu have been proposed [35], but these rely on spectral inter-domain features [36].
  • If σ is small or n large one can neglect the noise term entirely.
  • It is unclear to the author how these weights can be learned in a biologically plausible manner, one can approximate them.
  • The second term in Eq (9) is approximately zero and can be neglected compared to the first term, because σ−2kf j kf j O(s2 ns2 σ2.
  • One can approximate Kuu by its diagonal s2I, yielding weights U = s−1I that are constant, so no plasticity is necessary.
  • Conclusion:

    The author has introduced a biologically plausible Gaussian process approximation with good predictive performance and close approximation of the full Gaussian process.
  • With regard to neuroscience the paper introduces a biologically plausible Gaussian process approximation with good predictive performance and close approximation of the full Gaussian process, compared to VFE and FITC.
  • It yields initial results in line with existing experimental data and motivates new experiments for a more direct test of the model.
  • Ethical aspects and future societal consequences do not apply to this work
Tables
  • Table1: I set the number of inducing points equal to the number of hidden layer neurons in [<a class="ref-link" id="c12" href="#r12">12</a>, <a class="ref-link" id="c13" href="#r13">13</a>]. For the too big Year Prediction MSD dataset I used the Stochastic Variational GP of [<a class="ref-link" id="c46" href="#r46">46</a>]. Again, the kernel length scales and the inducing point positions of the BioNN were set to the values obtained with VFE. On these tasks VFE performs about as well as, if not better than, Dropout and PBP. Characteristics of the analyzed data sets, and average predictive log likelihood ± Std. Errors for Monte Carlo Dropout (Dropout, [<a class="ref-link" id="c13" href="#r13">13</a>]), Probabilistic Back-propagation (PBP, [<a class="ref-link" id="c12" href="#r12">12</a>]), sparse GP (VFE, [<a class="ref-link" id="c28" href="#r28">28</a>]), an artificial neural network (ANN) with architecture corresponding to a sparse GP (but differing weights), cf
  • Table2: Average KL(p q) and Std. Errors between full GP p and sparse approximation q
Download tables as Excel
Related work
  • Several other works have investigated how the brain could implement Bayesian inference, cf. [24, 25] and references therein. They proposed neural codes for encoding probability distributions over one or few sensory input variables which are scalars or vectors, whereas a Gaussian process is a distribution over functions [7]. Earlier works considered neural representations of the uncertainty p(x) of input variables x, whereas this work considers the neural encoding of a probability distribution p(f ) over a dependent target function f (x). To my knowledge, this is the first work to suggest how the brain could perform Bayesian nonparametric regression via GPs.
Funding
  • Acknowledgments and Disclosure of Funding The author was internally funded by the Simons Foundation
Study subjects and analysis
UCI datasets: 10
I next evaluated the performance of my BioNN on larger and higher dimensional data. I replicate the experiment set-up in [12] and compare to the predictive log-likelihood of Probabilistic Backpropagation [12] and Monte Carlo Dropout [13] on ten UCI datasets [45], cf. Table 1

datasets with merely O: 5
Fig. 4 reveals overall comparable performance of my BioNN to VFE and FITC. (As a biologically plausible control baseline, I also considered a RBF network that connects not only the mean but also the variance predicting neuron directly to the first layer neurons, but it performed badly due to overfitting.) Although the main objective is good predictive performance, I was also interested in how well my BioNN approximates the GP. For the five datasets with merely O(1,000) data points I was able to fit the full GP. Table 2 shows that my BioNN approximates the full GP nearly as well as VFE and much

tuning curve centers: 6
Fig. 5 shows how the centers, as well as the widths, of the tuning curves can be learned using REINFORCE, Eq (16). For each train/test split the 6 tuning curve centers were initialized on a regular grid at {0.5, 1.5, ..., 5.5} and updated to minimize the squared prediction error. As control variate I used a running average of the MSE

Reference
  • A. Bubic, D. Y. Von Cramon, and R. I. Schubotz. Prediction, cognition and the brain. Front. Hum. Neurosci., 4:25, 2010.
    Google ScholarLocate open access versionFindings
  • D. M. Wolpert and Z. Ghahramani. Computational principles of movement neuroscience. Nat. Neurosci., 3(11):1212–1217, 2000.
    Google ScholarLocate open access versionFindings
  • D. C. Knill and A. Pouget. The Bayesian brain: the role of uncertainty in neural coding and computation. Trends Neurosci., 27(12):712–719, 2004.
    Google ScholarLocate open access versionFindings
  • K. P. Körding and D. M. Wolpert. Bayesian integration in sensorimotor learning. Nature, 427(6971):244– 247, 2004.
    Google ScholarLocate open access versionFindings
  • C. Padoa-Schioppa and J. A. Assad. Neurons in the orbitofrontal cortex encode economic value. Nature, 441(7090):223–226, 2006.
    Google ScholarLocate open access versionFindings
  • M. O’Neill and W. Schultz. Coding of reward risk by orbitofrontal neurons is mostly distinct from coding of reward value. Neuron, 68(4):789–800, 2010.
    Google ScholarLocate open access versionFindings
  • C. E. Rasmussen and C. K. Williams. Gaussian processes for machine learning. The MIT Press, 2006.
    Google ScholarFindings
  • T. L. Griffiths, C. Lucas, J. Williams, and M. L. Kalish. Modeling human function learning with Gaussian processes. In NIPS, pages 553–560. 2009.
    Google ScholarLocate open access versionFindings
  • C. G. Lucas, T. L. Griffiths, J. J. Williams, and M. L. Kalish. A rational model of function learning. Psychon. Bull. Rev., 22(5):1193–1215, 2015.
    Google ScholarLocate open access versionFindings
  • C. M. Wu, E. Schulz, M. Speekenbrink, J. D. Nelson, and B. Meder. Generalization guides human exploration in vast decision spaces. Nat. Hum. Behav., 2(12):915–924, 2018.
    Google ScholarLocate open access versionFindings
  • R. M. Neal. Bayesian learning for neural networks. Springer, 1996.
    Google ScholarFindings
  • J. M. Hernández-Lobato and R. Adams. Probabilistic backpropagation for scalable learning of Bayesian neural networks. In ICML, pages 1861–1869, 2015.
    Google ScholarLocate open access versionFindings
  • Y. Gal and Z. Ghahramani. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In ICML, pages 1050–1059, 2016.
    Google ScholarLocate open access versionFindings
  • D. O. Hebb. The organization of behavior: A neuropsychological theory. Wiley, 1949.
    Google ScholarFindings
  • P. Földiak. Forming sparse representations by local anti-hebbian learning. Biol. Cybern., 64(2):165–170, 1990.
    Google ScholarLocate open access versionFindings
  • D. H. Hubel and T. N. Wiesel. Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J. Physiol., 160(1):106–154, 1962.
    Google ScholarLocate open access versionFindings
  • J. O’Keefe and J. Dostrovsky. The hippocampus as a spatial map: preliminary evidence from unit activity in the freely-moving rat. Brain Research, 34(1):171–175, 1971.
    Google ScholarLocate open access versionFindings
  • A. Georgopoulos, J. Kalaska, R. Caminiti, and J. Massey. On the relations between the direction of twodimensional arm movements and cell discharge in primate motor cortex. J. Neurosci., 2(11):1527–1537, 1982.
    Google ScholarLocate open access versionFindings
  • W. Schultz, P. Dayan, and P. R. Montague. A neural substrate of prediction and reward. Science, 275(5306):1593–1599, 1997.
    Google ScholarLocate open access versionFindings
  • M. O’Neill and W. Schultz. Risk prediction error coding in orbitofrontal neurons. J. Neurosci., 33(40):15810–15814, 2013.
    Google ScholarLocate open access versionFindings
  • I. Lee, A. L. Griffin, E. A. Zilli, H. Eichenbaum, and M. E. Hasselmo. Gradual translocation of spatial correlates of neuronal firing in the hippocampus toward prospective reward locations. Neuron, 51(5):639– 650, 2006.
    Google ScholarLocate open access versionFindings
  • R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn., 8:229–256, 1992.
    Google ScholarLocate open access versionFindings
  • H. S. Seung. Learning in spiking neural networks by reinforcement of stochastic synaptic transmission. Neuron, 40(6):1063–1073, 2003.
    Google ScholarLocate open access versionFindings
  • W. J. Ma, J. M. Beck, P. E. Latham, and A. Pouget. Bayesian inference with probabilistic population codes. Nat. Neurosci., 9(11):1432–1438, 2006.
    Google ScholarLocate open access versionFindings
  • J. Fiser, P. Berkes, G. Orbán, and M. Lengyel. Statistically optimal perception and learning: from behavior to neural representations. Trends Cogn. Sci., 14(3):119–130, 2010.
    Google ScholarLocate open access versionFindings
  • J. Quiñonero-Candela and C. E. Rasmussen. A unifying view of sparse approximate Gaussian process regression. J. Mach. Learn. Res., 6:1939–1959, 2005.
    Google ScholarLocate open access versionFindings
  • T. D. Bui, J. Yan, and R. E. Turner. A unifying framework for Gaussian process pseudo-point approximations using power expectation propagation. J. Mach. Learn. Res., 18(1):3649–3720, 2017.
    Google ScholarLocate open access versionFindings
  • M. Titsias. Variational learning of inducing variables in sparse Gaussian processes. In AISTATS, pages 567–574, 2009.
    Google ScholarLocate open access versionFindings
  • A. G. d. G. Matthews, J. Hensman, R. Turner, and Z. Ghahramani. On sparse variational methods and the kullback-leibler divergence between stochastic processes. In AISTATS, pages 231–239, 2016.
    Google ScholarLocate open access versionFindings
  • M. Seeger, C. K. Williams, and N. D. Lawrence. Fast forward selection to speed up sparse Gaussian process regression. In AISTATS, pages 205–212, 2003.
    Google ScholarLocate open access versionFindings
  • T. Poggio and F. Girosi. Networks for approximation and learning. Proc. IEEE, 78(9):1481–1497, 1990.
    Google ScholarLocate open access versionFindings
  • M. M. Lavrentiev. Some improperly posed problems of mathematical physics. Springer, 1967.
    Google ScholarFindings
  • A. N. Tikhonov and V. I. Arsenin. Solutions of ill-posed problems. V. H. Winston & Sons, 1977.
    Google ScholarFindings
  • M. Bauer, M. van der Wilk, and C. E. Rasmussen. Understanding probabilistic sparse Gaussian process approximations. In NIPS, pages 1533–1541, 2016.
    Google ScholarLocate open access versionFindings
  • D. R. Burt, C. E. Rasmussen, and M. Van Der Wilk. Rates of convergence for sparse variational Gaussian process regression. In ICML, pages 862–871, 2019.
    Google ScholarLocate open access versionFindings
  • M. Lázaro-Gredilla and A. Figueiras-Vidal. Inter-domain Gaussian processes for sparse inference using inducing features. In NIPS, pages 1087–1095, 2009.
    Google ScholarLocate open access versionFindings
  • J. S. Anderson, I. Lampl, D. C. Gillespie, and D. Ferster. The contribution of noise to contrast invariance of orientation tuning in cat visual cortex. Science, 290(5498):1968–1972, 2000.
    Google ScholarLocate open access versionFindings
  • D. J. Heeger. Half-squaring in responses of cat striate cells. Vis. Neurosci., 9(5):427–443, 1992.
    Google ScholarLocate open access versionFindings
  • K. D. Miller and T. W. Troyer. Neural noise can explain expansive, power-law nonlinearities in neural response functions. J. Neurophysiol., 87(2):653–659, 2002.
    Google ScholarLocate open access versionFindings
  • R. Cossart, D. Aronov, and R. Yuste. Attractor dynamics of network up states in the neocortex. Nature, 423(6937):283–288, 2003.
    Google ScholarLocate open access versionFindings
  • R. C. Froemke, M. M. Merzenich, and C. E. Schreiner. A synaptic memory trace for cortical receptive field plasticity. Nature, 450(7168):425–429, 2007.
    Google ScholarLocate open access versionFindings
  • C. Walder, K. I. Kim, and B. Schölkopf. Sparse multiscale Gaussian process regression. In ICML, pages 1112–1119, 2008.
    Google ScholarLocate open access versionFindings
  • GPy. GPy: A Gaussian process framework in python. http://github.com/SheffieldML/GPy, since
    Findings
  • E. Snelson and Z. Ghahramani. Sparse Gaussian processes using pseudo-inputs. In NIPS, pages 1257–1264, 2006.
    Google ScholarLocate open access versionFindings
  • D. Dua and C. Graff. UCI machine learning repository. http://archive.ics.uci.edu/ml, 2019.
    Findings
  • J. Hensman, N. N. Fusi, and N. D. Lawrence. Gaussian processes for big data. In UAI, pages 282–290, 2013.
    Google ScholarLocate open access versionFindings
  • D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. In ICLR, 2015.
    Google ScholarLocate open access versionFindings
  • S. A. Hollup, S. Molden, J. G. Donnett, M. B. Moser, and E. I. Moser. Accumulation of hippocampal place fields at the goal location in an annular watermaze task. J. Neurosci., 21(5):1635–44, 2001.
    Google ScholarLocate open access versionFindings
  • O. Mamad, L. Stumpp, H. M. McNamara, C. Ramakrishnan, K. Deisseroth, R. B. Reilly, and M. Tsanov. Place field assembly distribution encodes preferred locations. PLoS Biol., 15(9):e2002365, 2017.
    Google ScholarLocate open access versionFindings
  • A. P. Steiner and A. D. Redish. The road not taken: Neural correlates of decision making in orbitofrontal cortex. Front. Neurosci., 6:1–21, 2012.
    Google ScholarLocate open access versionFindings
  • A. M. Wikenheiser and G. Schoenbaum. Over the river, through the woods: Cognitive maps in the hippocampus and orbitofrontal cortex. Nat. Rev. Neurosci., 17(8):513–523, 2016.
    Google ScholarLocate open access versionFindings
  • C. S. Lansink, P. M. Goltstein, J. V. Lankelma, B. L. McNaughton, and C. M. A. Pennartz. Hippocampus leads ventral striatum in replay of place-reward information. PLoS Biol., 7(8):e1000173, 2009.
    Google ScholarLocate open access versionFindings
  • M. A. van der Meer, A. Johnson, N. C. Schmitzer-Torbert, and A. D. Redish. Triple dissociation of information processing in dorsal striatum, ventral striatum, and hippocampus on a learned spatial decision task. Neuron, 67(1):25–32, 2010.
    Google ScholarLocate open access versionFindings
  • K. Preuschoff, P. Bossaerts, and S. R. Quartz. Neural differentiation of expected reward and risk in human subcortical structures. Neuron, 51(3):381–390, 2006.
    Google ScholarLocate open access versionFindings
Author
Johannes Friedrich
Johannes Friedrich
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科