## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# Neuronal Gaussian Process Regression

NIPS 2020, (2020)

EI

Keywords

Abstract

The brain takes uncertainty intrinsic to our world into account. For example, associating spatial locations with rewards requires to predict not only expected reward at new spatial locations but also its uncertainty to avoid catastrophic events and forage safely. A powerful and flexible framework for nonlinear regression that takes uncert...More

Code:

Data:

Introduction

- Predictive processing represents one of the fundamental principles of neural computations [1].
- In the motor domain the brain employs predictive forward models [2], and a fundamental aspect of learned behavior is the ability to form associations between predictive environmental events and rewarding outcomes.
- These are just two examples of the general task of regression, to predict a dependent target variable given explanatory input variable(s), that the brain has to solve.
- The covariance function k(x, x ) depends on hyperparameters, which are usually learned by maximizing the log marginal likelihood

Highlights

- Predictive processing represents one of the fundamental principles of neural computations [1]
- A standard regression model assumes yi = f (xi) + i, where f is an unknown latent function that is corrupted by Gaussian observation noise i ∼ N (0, σ2)
- This paper introduces a biologically plausible implementation of Gaussian processes
- With regard to machine learning this paper shows a correspondence between Gaussian processes and certain neural networks and raises the question of how best to perform nonlinear regression with uncertainty estimates

Methods

**Methods that have an exactly diagonal**

Kuu have been proposed [35], but these rely on spectral inter-domain features [36].- If σ is small or n large one can neglect the noise term entirely.
- It is unclear to the author how these weights can be learned in a biologically plausible manner, one can approximate them.
- The second term in Eq (9) is approximately zero and can be neglected compared to the first term, because σ−2kf j kf j O(s2 ns2 σ2.
- One can approximate Kuu by its diagonal s2I, yielding weights U = s−1I that are constant, so no plasticity is necessary.

Conclusion

- The author has introduced a biologically plausible Gaussian process approximation with good predictive performance and close approximation of the full Gaussian process.
- With regard to neuroscience the paper introduces a biologically plausible Gaussian process approximation with good predictive performance and close approximation of the full Gaussian process, compared to VFE and FITC.
- It yields initial results in line with existing experimental data and motivates new experiments for a more direct test of the model.
- Ethical aspects and future societal consequences do not apply to this work

Summary

## Introduction:

Predictive processing represents one of the fundamental principles of neural computations [1].- In the motor domain the brain employs predictive forward models [2], and a fundamental aspect of learned behavior is the ability to form associations between predictive environmental events and rewarding outcomes.
- These are just two examples of the general task of regression, to predict a dependent target variable given explanatory input variable(s), that the brain has to solve.
- The covariance function k(x, x ) depends on hyperparameters, which are usually learned by maximizing the log marginal likelihood
## Methods:

**Methods that have an exactly diagonal**

Kuu have been proposed [35], but these rely on spectral inter-domain features [36].- If σ is small or n large one can neglect the noise term entirely.
- It is unclear to the author how these weights can be learned in a biologically plausible manner, one can approximate them.
- The second term in Eq (9) is approximately zero and can be neglected compared to the first term, because σ−2kf j kf j O(s2 ns2 σ2.
- One can approximate Kuu by its diagonal s2I, yielding weights U = s−1I that are constant, so no plasticity is necessary.
## Conclusion:

The author has introduced a biologically plausible Gaussian process approximation with good predictive performance and close approximation of the full Gaussian process.- With regard to neuroscience the paper introduces a biologically plausible Gaussian process approximation with good predictive performance and close approximation of the full Gaussian process, compared to VFE and FITC.
- It yields initial results in line with existing experimental data and motivates new experiments for a more direct test of the model.
- Ethical aspects and future societal consequences do not apply to this work

- Table1: I set the number of inducing points equal to the number of hidden layer neurons in [<a class="ref-link" id="c12" href="#r12">12</a>, <a class="ref-link" id="c13" href="#r13">13</a>]. For the too big Year Prediction MSD dataset I used the Stochastic Variational GP of [<a class="ref-link" id="c46" href="#r46">46</a>]. Again, the kernel length scales and the inducing point positions of the BioNN were set to the values obtained with VFE. On these tasks VFE performs about as well as, if not better than, Dropout and PBP. Characteristics of the analyzed data sets, and average predictive log likelihood ± Std. Errors for Monte Carlo Dropout (Dropout, [<a class="ref-link" id="c13" href="#r13">13</a>]), Probabilistic Back-propagation (PBP, [<a class="ref-link" id="c12" href="#r12">12</a>]), sparse GP (VFE, [<a class="ref-link" id="c28" href="#r28">28</a>]), an artificial neural network (ANN) with architecture corresponding to a sparse GP (but differing weights), cf
- Table2: Average KL(p q) and Std. Errors between full GP p and sparse approximation q

Related work

- Several other works have investigated how the brain could implement Bayesian inference, cf. [24, 25] and references therein. They proposed neural codes for encoding probability distributions over one or few sensory input variables which are scalars or vectors, whereas a Gaussian process is a distribution over functions [7]. Earlier works considered neural representations of the uncertainty p(x) of input variables x, whereas this work considers the neural encoding of a probability distribution p(f ) over a dependent target function f (x). To my knowledge, this is the first work to suggest how the brain could perform Bayesian nonparametric regression via GPs.

Funding

- Acknowledgments and Disclosure of Funding The author was internally funded by the Simons Foundation

Study subjects and analysis

UCI datasets: 10

I next evaluated the performance of my BioNN on larger and higher dimensional data. I replicate the experiment set-up in [12] and compare to the predictive log-likelihood of Probabilistic Backpropagation [12] and Monte Carlo Dropout [13] on ten UCI datasets [45], cf. Table 1

datasets with merely O: 5

Fig. 4 reveals overall comparable performance of my BioNN to VFE and FITC. (As a biologically plausible control baseline, I also considered a RBF network that connects not only the mean but also the variance predicting neuron directly to the first layer neurons, but it performed badly due to overfitting.) Although the main objective is good predictive performance, I was also interested in how well my BioNN approximates the GP. For the five datasets with merely O(1,000) data points I was able to fit the full GP. Table 2 shows that my BioNN approximates the full GP nearly as well as VFE and much

tuning curve centers: 6

Fig. 5 shows how the centers, as well as the widths, of the tuning curves can be learned using REINFORCE, Eq (16). For each train/test split the 6 tuning curve centers were initialized on a regular grid at {0.5, 1.5, ..., 5.5} and updated to minimize the squared prediction error. As control variate I used a running average of the MSE

Reference

- A. Bubic, D. Y. Von Cramon, and R. I. Schubotz. Prediction, cognition and the brain. Front. Hum. Neurosci., 4:25, 2010.
- D. M. Wolpert and Z. Ghahramani. Computational principles of movement neuroscience. Nat. Neurosci., 3(11):1212–1217, 2000.
- D. C. Knill and A. Pouget. The Bayesian brain: the role of uncertainty in neural coding and computation. Trends Neurosci., 27(12):712–719, 2004.
- K. P. Körding and D. M. Wolpert. Bayesian integration in sensorimotor learning. Nature, 427(6971):244– 247, 2004.
- C. Padoa-Schioppa and J. A. Assad. Neurons in the orbitofrontal cortex encode economic value. Nature, 441(7090):223–226, 2006.
- M. O’Neill and W. Schultz. Coding of reward risk by orbitofrontal neurons is mostly distinct from coding of reward value. Neuron, 68(4):789–800, 2010.
- C. E. Rasmussen and C. K. Williams. Gaussian processes for machine learning. The MIT Press, 2006.
- T. L. Griffiths, C. Lucas, J. Williams, and M. L. Kalish. Modeling human function learning with Gaussian processes. In NIPS, pages 553–560. 2009.
- C. G. Lucas, T. L. Griffiths, J. J. Williams, and M. L. Kalish. A rational model of function learning. Psychon. Bull. Rev., 22(5):1193–1215, 2015.
- C. M. Wu, E. Schulz, M. Speekenbrink, J. D. Nelson, and B. Meder. Generalization guides human exploration in vast decision spaces. Nat. Hum. Behav., 2(12):915–924, 2018.
- R. M. Neal. Bayesian learning for neural networks. Springer, 1996.
- J. M. Hernández-Lobato and R. Adams. Probabilistic backpropagation for scalable learning of Bayesian neural networks. In ICML, pages 1861–1869, 2015.
- Y. Gal and Z. Ghahramani. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In ICML, pages 1050–1059, 2016.
- D. O. Hebb. The organization of behavior: A neuropsychological theory. Wiley, 1949.
- P. Földiak. Forming sparse representations by local anti-hebbian learning. Biol. Cybern., 64(2):165–170, 1990.
- D. H. Hubel and T. N. Wiesel. Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J. Physiol., 160(1):106–154, 1962.
- J. O’Keefe and J. Dostrovsky. The hippocampus as a spatial map: preliminary evidence from unit activity in the freely-moving rat. Brain Research, 34(1):171–175, 1971.
- A. Georgopoulos, J. Kalaska, R. Caminiti, and J. Massey. On the relations between the direction of twodimensional arm movements and cell discharge in primate motor cortex. J. Neurosci., 2(11):1527–1537, 1982.
- W. Schultz, P. Dayan, and P. R. Montague. A neural substrate of prediction and reward. Science, 275(5306):1593–1599, 1997.
- M. O’Neill and W. Schultz. Risk prediction error coding in orbitofrontal neurons. J. Neurosci., 33(40):15810–15814, 2013.
- I. Lee, A. L. Griffin, E. A. Zilli, H. Eichenbaum, and M. E. Hasselmo. Gradual translocation of spatial correlates of neuronal firing in the hippocampus toward prospective reward locations. Neuron, 51(5):639– 650, 2006.
- R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn., 8:229–256, 1992.
- H. S. Seung. Learning in spiking neural networks by reinforcement of stochastic synaptic transmission. Neuron, 40(6):1063–1073, 2003.
- W. J. Ma, J. M. Beck, P. E. Latham, and A. Pouget. Bayesian inference with probabilistic population codes. Nat. Neurosci., 9(11):1432–1438, 2006.
- J. Fiser, P. Berkes, G. Orbán, and M. Lengyel. Statistically optimal perception and learning: from behavior to neural representations. Trends Cogn. Sci., 14(3):119–130, 2010.
- J. Quiñonero-Candela and C. E. Rasmussen. A unifying view of sparse approximate Gaussian process regression. J. Mach. Learn. Res., 6:1939–1959, 2005.
- T. D. Bui, J. Yan, and R. E. Turner. A unifying framework for Gaussian process pseudo-point approximations using power expectation propagation. J. Mach. Learn. Res., 18(1):3649–3720, 2017.
- M. Titsias. Variational learning of inducing variables in sparse Gaussian processes. In AISTATS, pages 567–574, 2009.
- A. G. d. G. Matthews, J. Hensman, R. Turner, and Z. Ghahramani. On sparse variational methods and the kullback-leibler divergence between stochastic processes. In AISTATS, pages 231–239, 2016.
- M. Seeger, C. K. Williams, and N. D. Lawrence. Fast forward selection to speed up sparse Gaussian process regression. In AISTATS, pages 205–212, 2003.
- T. Poggio and F. Girosi. Networks for approximation and learning. Proc. IEEE, 78(9):1481–1497, 1990.
- M. M. Lavrentiev. Some improperly posed problems of mathematical physics. Springer, 1967.
- A. N. Tikhonov and V. I. Arsenin. Solutions of ill-posed problems. V. H. Winston & Sons, 1977.
- M. Bauer, M. van der Wilk, and C. E. Rasmussen. Understanding probabilistic sparse Gaussian process approximations. In NIPS, pages 1533–1541, 2016.
- D. R. Burt, C. E. Rasmussen, and M. Van Der Wilk. Rates of convergence for sparse variational Gaussian process regression. In ICML, pages 862–871, 2019.
- M. Lázaro-Gredilla and A. Figueiras-Vidal. Inter-domain Gaussian processes for sparse inference using inducing features. In NIPS, pages 1087–1095, 2009.
- J. S. Anderson, I. Lampl, D. C. Gillespie, and D. Ferster. The contribution of noise to contrast invariance of orientation tuning in cat visual cortex. Science, 290(5498):1968–1972, 2000.
- D. J. Heeger. Half-squaring in responses of cat striate cells. Vis. Neurosci., 9(5):427–443, 1992.
- K. D. Miller and T. W. Troyer. Neural noise can explain expansive, power-law nonlinearities in neural response functions. J. Neurophysiol., 87(2):653–659, 2002.
- R. Cossart, D. Aronov, and R. Yuste. Attractor dynamics of network up states in the neocortex. Nature, 423(6937):283–288, 2003.
- R. C. Froemke, M. M. Merzenich, and C. E. Schreiner. A synaptic memory trace for cortical receptive field plasticity. Nature, 450(7168):425–429, 2007.
- C. Walder, K. I. Kim, and B. Schölkopf. Sparse multiscale Gaussian process regression. In ICML, pages 1112–1119, 2008.
- GPy. GPy: A Gaussian process framework in python. http://github.com/SheffieldML/GPy, since
- E. Snelson and Z. Ghahramani. Sparse Gaussian processes using pseudo-inputs. In NIPS, pages 1257–1264, 2006.
- D. Dua and C. Graff. UCI machine learning repository. http://archive.ics.uci.edu/ml, 2019.
- J. Hensman, N. N. Fusi, and N. D. Lawrence. Gaussian processes for big data. In UAI, pages 282–290, 2013.
- D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. In ICLR, 2015.
- S. A. Hollup, S. Molden, J. G. Donnett, M. B. Moser, and E. I. Moser. Accumulation of hippocampal place fields at the goal location in an annular watermaze task. J. Neurosci., 21(5):1635–44, 2001.
- O. Mamad, L. Stumpp, H. M. McNamara, C. Ramakrishnan, K. Deisseroth, R. B. Reilly, and M. Tsanov. Place field assembly distribution encodes preferred locations. PLoS Biol., 15(9):e2002365, 2017.
- A. P. Steiner and A. D. Redish. The road not taken: Neural correlates of decision making in orbitofrontal cortex. Front. Neurosci., 6:1–21, 2012.
- A. M. Wikenheiser and G. Schoenbaum. Over the river, through the woods: Cognitive maps in the hippocampus and orbitofrontal cortex. Nat. Rev. Neurosci., 17(8):513–523, 2016.
- C. S. Lansink, P. M. Goltstein, J. V. Lankelma, B. L. McNaughton, and C. M. A. Pennartz. Hippocampus leads ventral striatum in replay of place-reward information. PLoS Biol., 7(8):e1000173, 2009.
- M. A. van der Meer, A. Johnson, N. C. Schmitzer-Torbert, and A. D. Redish. Triple dissociation of information processing in dorsal striatum, ventral striatum, and hippocampus on a learned spatial decision task. Neuron, 67(1):25–32, 2010.
- K. Preuschoff, P. Bossaerts, and S. R. Quartz. Neural differentiation of expected reward and risk in human subcortical structures. Neuron, 51(3):381–390, 2006.

Tags

Comments

数据免责声明

页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果，我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问，可以通过电子邮件方式联系我们：report@aminer.cn