# Efficient non-conjugate Gaussian process factor models for spike countdata using polynomial approximations

international conference on machine learning, 2020.

Weibo:

Abstract:

Gaussian Process Factor Analysis (GPFA) has been broadly applied to the problem of identifying smooth, low-dimensional temporal structure underlying large-scale neural recordings. However, spike trains are non-Gaussian, which motivates combining GPFA with discrete observation models for binned spike count data. The drawback to this approa...More

Code:

Data:

Introduction

- Recent advances in neural recording technologies have enabled the collection of increasingly highdimensional neural data-sets.
- The authors introduce a novel procedure for learning non-conjugate GPFA models with count observations, which the authors refer to as Polynomial Approximate Log-likelihood (PAL).
- The authors propose Polynomial Approximate Log-likelihood (PAL), an approximation scheme for efficient learning and inference in non-conjugate Gaussian latent variable models.

Highlights

- Recent advances in neural recording technologies have enabled the collection of increasingly highdimensional neural data-sets
- We evaluate the performance of our methods by applying Polynomial Approximate Log-likelihood (PAL) and Black Box Variational Inference (BBVI) to two different multi-neuron datasets, one from mouse visual cortex and one from monkey parietal cortex, under three different choices of count model
- We show that PAL achieves a substantial speedup over BBVI, and that count-Gaussian Process Factor Analysis (GPFA) models generally outperform standard Gaussian GPFA for extracting latent structure from spike train data
- Average MSE across all neurons for all count GPFA models are shown in the middle panel of Figure 3, showing PALs limitations in the Poisson case with an exponential nonlinearity
- We do this by initializing the BBVI algorithm with the hyperparameters provided by optimization of equation 4. This procedure is more stable than full BBVI with random initial hyperparameters, and achieves accurate model recovery. We demonstrate this is true not just for Poisson-GPFA, where BBVI provides more-accurate solutions, but extends to all countGPFA models
- An initial sharp increase in the evidence lower bound (ELBO) is always observed in all models, as here latent structure is approximately identified, but hyperparameters are tuned at the end of the BBVI optimization procedure

Results

- All PAL count-GPFA models have the same general form for the approximate log marginal likelihood: E(y|W, θ) ≈ log |Σ| + μ
- The authors map latents through a sigmoidal nonlinearity, σ(x) = 1/(1 + exp(−x)), to obtain the binomial parameter p, and the authors set the number-of-trials parameter separately for each neuron using the maximum number of observed spikes in a single time bin.
- NN 1) is the concatenated vector of max-count parameters for each neuron across time bins, and the authors have ignored terms that do not depend on Wx. The problematic term here is the nonlinear second term, log(1 + exp(−x)), which the authors approximate, as before, using a second-order Chebyshev polynomial approximation.
- To derive a PAL estimator, the authors use a quadratic approximation to the nonlinear term log(1 + α exp(x)) on a per-neuron basis.
- The authors show the non-linearities approximated by Chebyshev polynomials for each model, the expected number of spikes for the ith neuron as a function of the latents, X, and loadings matrix W, and the variance and mean of the polynomial-approximated marginal distribution.
- Average MSE across all neurons for all count GPFA models are shown in the middle panel of Figure 3, showing PALs limitations in the Poisson case with an exponential nonlinearity.
- An initial sharp increase in the ELBO is always observed in all models, as here latent structure is approximately identified, but hyperparameters are tuned at the end of the BBVI optimization procedure.
- The authors use the count-GPFA inference procedure to compare GPFA models with different noise characterizations to see which latent variable model best describes observed spiking data.
- The estimated latent dimensionality was 6 for all count-GPFA models, chosen via maximization of cross-validated log-likelihood.

Conclusion

- The authors have a developed novel technique for learning count Gaussian process factor analytic models that uses a polynomial approximate log-likelihood (PAL) for rapid closed-form evaluation of marginal likelihoods.
- This approximation can be used to estimate model parameters directly, or to provide initial values for black box variational inference that overcomes significant well-known BBVI optimization limitations.
- The authors tested the various non-conjugate GPFA models on neural data and these count-GPFA models are comparable or better than traditional GPFA approaches, which do not often consider count noise

Summary

- Recent advances in neural recording technologies have enabled the collection of increasingly highdimensional neural data-sets.
- The authors introduce a novel procedure for learning non-conjugate GPFA models with count observations, which the authors refer to as Polynomial Approximate Log-likelihood (PAL).
- The authors propose Polynomial Approximate Log-likelihood (PAL), an approximation scheme for efficient learning and inference in non-conjugate Gaussian latent variable models.
- All PAL count-GPFA models have the same general form for the approximate log marginal likelihood: E(y|W, θ) ≈ log |Σ| + μ
- The authors map latents through a sigmoidal nonlinearity, σ(x) = 1/(1 + exp(−x)), to obtain the binomial parameter p, and the authors set the number-of-trials parameter separately for each neuron using the maximum number of observed spikes in a single time bin.
- NN 1) is the concatenated vector of max-count parameters for each neuron across time bins, and the authors have ignored terms that do not depend on Wx. The problematic term here is the nonlinear second term, log(1 + exp(−x)), which the authors approximate, as before, using a second-order Chebyshev polynomial approximation.
- To derive a PAL estimator, the authors use a quadratic approximation to the nonlinear term log(1 + α exp(x)) on a per-neuron basis.
- The authors show the non-linearities approximated by Chebyshev polynomials for each model, the expected number of spikes for the ith neuron as a function of the latents, X, and loadings matrix W, and the variance and mean of the polynomial-approximated marginal distribution.
- Average MSE across all neurons for all count GPFA models are shown in the middle panel of Figure 3, showing PALs limitations in the Poisson case with an exponential nonlinearity.
- An initial sharp increase in the ELBO is always observed in all models, as here latent structure is approximately identified, but hyperparameters are tuned at the end of the BBVI optimization procedure.
- The authors use the count-GPFA inference procedure to compare GPFA models with different noise characterizations to see which latent variable model best describes observed spiking data.
- The estimated latent dimensionality was 6 for all count-GPFA models, chosen via maximization of cross-validated log-likelihood.
- The authors have a developed novel technique for learning count Gaussian process factor analytic models that uses a polynomial approximate log-likelihood (PAL) for rapid closed-form evaluation of marginal likelihoods.
- This approximation can be used to estimate model parameters directly, or to provide initial values for black box variational inference that overcomes significant well-known BBVI optimization limitations.
- The authors tested the various non-conjugate GPFA models on neural data and these count-GPFA models are comparable or better than traditional GPFA approaches, which do not often consider count noise

- Table1: Summary of PAL expressions for count-GPFA models. Top line gives the spike rate of neuron i at time t given the latent vector xt and loading weights wi for neuron i. Second line gives the nonlinear term of the log-likelihood that must be approximated under PAL. The third row, H is defined by H = Σ−1 − K−1, which succinctly presents posterior covariances, and the fourth line μ shows approximate posterior means

Reference

- BM Yu, JP Cunningham, G Santhanam, SI Ryu, KV Shenoy, and M Sahani. Gaussian-process factor analysis for low-dimensional single-trial analysis of neural population activity. In Adv neur inf proc sys, pages 1881–1888, 2009.
- John P Cunningham and B M Yu. Dimensionality reduction for large-scale neural recordings. Nature neuroscience, 17(11):1500–1509, 2014.
- KC Lakshmanan, PT Sadtler, EC Tyler-Kabara, AP Batista, and BM Yu. Extracting low-dimensional latent structure from time series in the presence of delays. Neural computation, 2015.
- Evan Archer, Il Memming Park, Lars Buesing, John Cunningham, and Liam Paninski. Black box variational inference for state space models. stat, 1050:23, 2015.
- Anqi Wu, Nicholas G Roy, Stephen Keeley, and Jonathan W Pillow. Gaussian process based nonlinear latent structure discovery in multivariate spike train data. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 3499–3508. Curran Associates, Inc., 2017.
- L Buesing, J H Macke, and M Sahani. Spectral learning of linear dynamics from generalised-linear observations with application to neural population data. In Adv neur inf proc sys, pages 1682–1690, 2012.
- Jakob H Macke, Lars Buesing, John P Cunningham, M Yu Byron, Krishna V Shenoy, and Maneesh Sahani. Empirical models of spiking in neural populations. In Advances in neural information processing systems, pages 1350–1358, 2011.
- Yuan Zhao and Il Memming Park. Variational latent gaussian process for recovering single-trial dynamics from population spike trains. Neural computation, 29(5):1293–1316, 2017.
- Jonathan Huggins, Ryan P Adams, and Tamara Broderick. Pass-glm: polynomial approximate sufficient statistics for scalable bayesian glm inference. In Advances in Neural Information Processing Systems, pages 3614–3624, 2017.
- David M Zoltowski and Jonathan W Pillow. Scaling the poisson glm to massive neural datasets through polynomial approximations. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 3521–3531. Curran Associates, Inc., 2018.
- Rajesh Ranganath, Sean Gerrish, and David Blei. Black box variational inference. In Artificial Intelligence and Statistics, pages 814–822, 2014.
- Yuanjun Gao, Lars Busing, Krishna V Shenoy, and John P Cunningham. High-dimensional neural spike train analysis with generalized count linear dynamical systems. In Advances in neural information processing systems, pages 2044–2052, 2015.
- M. Shadlen and W. Newsome. The variable discharge of cortical neurons: implications for connectivity, computation, and information coding. Journal of Neuroscience, 18:3870–3896, 1998.
- P. Kara, P. Reinagel, and R. C Reid. Low response variability in simultaneously recorded retinal, thalamic, and cortical neurons. Neuron, 27:636–646, 2000.
- Gaby Maimon and John A Assad. Beyond poisson: increased spike-time regularity across primate parietal cortex. Neuron, 62(3):426–440, May 2009.
- Jonathan Pillow and James Scott. Fully bayesian inference for neural models with negative-binomial spiking. In P. Bartlett, F.C.N. Pereira, C.J.C. Burges, L. Bottou, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 1907–1915, 2012.
- RLT Goris, JA Movshon, and EP Simoncelli. Partitioning neuronal variability. Nature neuroscience, 17(6):858–865, 2014.
- Ian H Stevenson. Flexible models for spike count data with both over-and under-dispersion. Journal of computational neuroscience, 41(1):29–43, 2016.
- Adam S Charles, Mijung Park, J Patrick Weller, Gregory D Horwitz, and Jonathan W Pillow. Dethroning the fano factor: A flexible, model-based approach to partitioning neural variability. Neural computation, 30(4):1012–1045, 2018.
- Lea Duncker and Maneesh Sahani. Temporal alignment and latent gaussian process factor inference in population spike trains. bioRxiv, page 331751, 2018.
- John C Mason and David C Handscomb. Chebyshev polynomials. CRC Press, 2002.
- Scott Linderman, Ryan P Adams, and Jonathan W Pillow. Bayesian latent structure discovery from multi-neuron recordings. In Advances in neural information processing systems, pages 2002–2010, 2016.
- David M Blei, Andrew Y Ng, and Michael I Jordan. Latent dirichlet allocation. Journal of machine Learning research, 3(Jan):993–1022, 2003.
- Diederik P Kingma, Tim Salimans, and Max Welling. Variational dropout and the local reparameterization trick. In Advances in Neural Information Processing Systems, pages 2575–2583, 2015.
- Matthew D Hoffman, David M Blei, Chong Wang, and John Paisley. Stochastic variational inference. The Journal of Machine Learning Research, 14(1):1303–1347, 2013.
- Hugh Salimbeni, Stefanos Eleftheriadis, and James Hensman. Natural gradients in practice: Non-conjugate variational inference in gaussian process models. arXiv preprint arXiv:1803.09151, 2018.
- Geoffrey Roeder, Yuhuai Wu, and David Duvenaud. Sticking the landing: An asymptotically zero-variance gradient estimator for variational inference. Advances in Neural Information Processing Systems, 2017.
- Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. Stochastic backpropagation and approximate inference in deep generative models. arXiv preprint arXiv:1401.4082, 2014.
- Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Jacob L Yates, Il Memming Park, Leor N Katz, Jonathan W Pillow, and Alexander C Huk. Functional dissection of signal and noise in mt and lip during decision-making. Nature neuroscience, 20(9):1285, 2017.

Tags

Comments