AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We study a mean-field spike and slab Variational Bayes approximation of widely used Bayesian model selection priors in sparse high-dimensional logistic regression

Spike and slab variational Bayes for high dimensional logistic regression

NIPS 2020, (2020)

Cited by: 0|Views8
EI
Full Text
Bibtex
Weibo

Abstract

Variational Bayes (VB) is a popular scalable alternative to Markov chain Monte Carlo for Bayesian inference. We study a mean-field spike and slab VB approximation of widely used Bayesian model selection priors in sparse high-dimensional logistic regression. We provide non-asymptotic theoretical guarantees for the VB posterior in both $\...More

Code:

Data:

Introduction
  • Let x ∈ Rp denote a feature vector and Y ∈ {0, 1} an associated binary label to be predicted.
  • In Bayesian logistic regression, one assigns a prior distribution to θ, giving a probabilistic model.
  • An especially natural Bayesian way to model sparsity is via a model selection prior, which assigns probabilistic weights to every potential model, i.e. every subset of {1, .
  • P} corresponding to selecting the non-zero coordinates of θ ∈ Rp. An especially natural Bayesian way to model sparsity is via a model selection prior, which assigns probabilistic weights to every potential model, i.e. every subset of {1, .
  • This is a widely used Bayesian approach and includes the hugely popular spike and slab prior [17, 31]
Highlights
  • Let x ∈ Rp denote a feature vector and Y ∈ {0, 1} an associated binary label to be predicted
  • We further demonstrate that our Variational Bayes (VB) algorithm is empirically competitive with other state-of-the-art Bayesian sparse variable selection methods for logistic regression
  • We provide theoretical guarantees for the VB posterior Q∗ in (6)
  • We present a coordinate-ascent variational inference (CAVI) algorithm to compute the VB posterior IQn∗troind(u6c)i.nCg obninsaidryerltahteenptrvioarri(a2b)lwesit(hzθj)jpj∼=1ii,dth(i1s−spwik)eδ0a+ndwsLlaabp(pλr)ioarnhdahsyhpieerraprrciohricwal∼repBreetsae(nat0a,tibo0n)
  • This paper investigates a scalable and interpretable mean-field variational approximation of the popular spike and slab prior with Laplace slabs in high-dimensional logistic regression
  • We confirm the improved performance of our VB algorithm over common sparse VB approaches in a numerical study
  • The proposed approach performs comparably with other state-of-the-art sparse high-dimensional Bayesian variable selection methods for logistic regression, but scales substantially better to high-dimensional models where other approaches based on the EM algorithm or Markov chain Monte Carlo (MCMC) are not computable
Methods
  • Design matrix and sparsity assumptions

    In the high-dimensional case p > n, the parameter θ in model (1) is not identifiable, let alone estimable, without additional conditions on the design matrix X.
  • Design matrix and sparsity assumptions.
  • In the high-dimensional case p > n, the parameter θ in model (1) is not identifiable, let alone estimable, without additional conditions on the design matrix X.
  • A sufficient condition for consistent estimation is ‘local invertibility’ of XT X when restricted to sparse vectors.
  • Define the diagonal matrix W ∈ Rn×n with ith diagonal entry.
  • P}, set κ(s) = sup Xθ X2 2 2 θ |Sθ | ≤
Results
  • In Table 1 and the additional simulations in Section 8, the authors see that using Laplace slabs in the prior (9) generally outperforms the commonly used Gaussian slabs in all statistical metrics (l2-loss, MPSE, FDR), in some cases substantially so.
  • This highlights the empirical advantages of using Laplace rather than Gaussian slabs for the prior underlying the VB approximation and matches the theory presented in Section 3, as well as similar observations in linear regression [42].
  • The optimization routines required in Algorithm 1 mean a naive implementation can significantly increase the run-time; the authors are currently working on a more efficient implementation as an R-package sparsevb [16] that should reduce the run-time by at least an order of magnitude
Conclusion
  • This paper investigates a scalable and interpretable mean-field variational approximation of the popular spike and slab prior with Laplace slabs in high-dimensional logistic regression.
  • The results derived here are the first steps towards better understanding VB methods in sparse high-dimensional nonlinear models
  • It opens up several interesting future lines of research for applying scalable VB implementations of spike and slab priors in complex highdimensional models, including Bayesian neural networks [39], graphical models [26] and high-dimensional Bayesian time series [44].
  • Since the results have no specific applications in mind, seeking rather to explain and improve an existing method, any potential broader impact will derive from improved performance in fields where such methods are already used
Summary
  • Introduction:

    Let x ∈ Rp denote a feature vector and Y ∈ {0, 1} an associated binary label to be predicted.
  • In Bayesian logistic regression, one assigns a prior distribution to θ, giving a probabilistic model.
  • An especially natural Bayesian way to model sparsity is via a model selection prior, which assigns probabilistic weights to every potential model, i.e. every subset of {1, .
  • P} corresponding to selecting the non-zero coordinates of θ ∈ Rp. An especially natural Bayesian way to model sparsity is via a model selection prior, which assigns probabilistic weights to every potential model, i.e. every subset of {1, .
  • This is a widely used Bayesian approach and includes the hugely popular spike and slab prior [17, 31]
  • Methods:

    Design matrix and sparsity assumptions

    In the high-dimensional case p > n, the parameter θ in model (1) is not identifiable, let alone estimable, without additional conditions on the design matrix X.
  • Design matrix and sparsity assumptions.
  • In the high-dimensional case p > n, the parameter θ in model (1) is not identifiable, let alone estimable, without additional conditions on the design matrix X.
  • A sufficient condition for consistent estimation is ‘local invertibility’ of XT X when restricted to sparse vectors.
  • Define the diagonal matrix W ∈ Rn×n with ith diagonal entry.
  • P}, set κ(s) = sup Xθ X2 2 2 θ |Sθ | ≤
  • Results:

    In Table 1 and the additional simulations in Section 8, the authors see that using Laplace slabs in the prior (9) generally outperforms the commonly used Gaussian slabs in all statistical metrics (l2-loss, MPSE, FDR), in some cases substantially so.
  • This highlights the empirical advantages of using Laplace rather than Gaussian slabs for the prior underlying the VB approximation and matches the theory presented in Section 3, as well as similar observations in linear regression [42].
  • The optimization routines required in Algorithm 1 mean a naive implementation can significantly increase the run-time; the authors are currently working on a more efficient implementation as an R-package sparsevb [16] that should reduce the run-time by at least an order of magnitude
  • Conclusion:

    This paper investigates a scalable and interpretable mean-field variational approximation of the popular spike and slab prior with Laplace slabs in high-dimensional logistic regression.
  • The results derived here are the first steps towards better understanding VB methods in sparse high-dimensional nonlinear models
  • It opens up several interesting future lines of research for applying scalable VB implementations of spike and slab priors in complex highdimensional models, including Bayesian neural networks [39], graphical models [26] and high-dimensional Bayesian time series [44].
  • Since the results have no specific applications in mind, seeking rather to explain and improve an existing method, any potential broader impact will derive from improved performance in fields where such methods are already used
Tables
  • Table1: Comparing sparse Bayesian methods in high-dimensional logistic regression
  • Table2: Marginal VB credible intervals for individual features
  • Table3: Table 3
  • Table4: Varying the scale hyperparameter
Download tables as Excel
Funding
  • Botond Szabó received funding from the Netherlands Organization for Scientific Research (NWO) under Project number: 639.031.654
Study subjects and analysis
tests cases: 4
In view of the excellent FDR control of our VB method in earlier simulations, we further investigate the performance of these marginal credible sets empirically. We consider 4 tests cases, consisting of the above example (Test 0) and Tests 1-3 from Section 8.2. In each case, we computed 95% marginal credible intervals for the coefficients, i.e. the intervals Ij , j = 1, . . . , p, of smallest length such that Q∗(θj ∈ Ij ) ≥ 0.95

Reference
  • ALQUIER, P., AND RIDGWAY, J. Concentration of tempered posteriors and of their variational approximations. Ann. Statist. 48, 3 (2020), 1475–1497.
    Google ScholarLocate open access versionFindings
  • ATCHADÉ, Y. A. On the contraction properties of some high-dimensional quasi-posterior distributions. Ann. Statist. 45, 5 (2017), 2248–2273.
    Google ScholarLocate open access versionFindings
  • BACH, F. Self-concordant analysis for logistic regression. Electron. J. Stat. 4 (2010), 384–414.
    Google ScholarLocate open access versionFindings
  • BANERJEE, S., CASTILLO, I., AND GHOSAL, S. Survey paper: Bayesian inference in highdimensional models.
    Google ScholarFindings
  • BHARDWAJ, S., CURTIN, R. R., EDEL, M., MENTEKIDIS, Y., AND SANDERSON, C. ensmallen: a flexible C++ library for efficient function optimization, 2018.
    Google ScholarFindings
  • BISHOP, C. M. Pattern recognition and machine learning. Information Science and Statistics.
    Google ScholarLocate open access versionFindings
  • Springer, New York, 2006.
    Google ScholarFindings
  • [7] BLEI, D. M., KUCUKELBIR, A., AND MCAULIFFE, J. D. Variational inference: a review for statisticians. J. Amer. Statist. Assoc. 112, 518 (2017), 859–877.
    Google ScholarLocate open access versionFindings
  • [8] BOUCHERON, S., LUGOSI, G., AND MASSART, P. Concentration inequalities: A nonasymptotic theory of independence. Oxford university press, 2013.
    Google ScholarFindings
  • [9] BÜHLMANN, P., AND VAN DE GEER, S. Statistics for high-dimensional data. Springer Series in Statistics. Springer, Heidelberg, 2011.
    Google ScholarFindings
  • [10] CARBONETTO, P., AND STEPHENS, M. Scalable variational inference for Bayesian variable selection in regression, and its accuracy in genetic association studies. Bayesian Anal. 7, 1 (2012), 73–107.
    Google ScholarFindings
  • [11] CARVALHO, C. M., POLSON, N. G., AND SCOTT, J. G. The horseshoe estimator for sparse signals. Biometrika 97, 2 (2010), 465–480.
    Google ScholarLocate open access versionFindings
  • [12] CASTILLO, I., AND ROQUAIN, E. On spike and slab empirical Bayes multiple testing. Ann. Statist. 48, 5 (2020), 2548–2574.
    Google ScholarLocate open access versionFindings
  • [13] CASTILLO, I., SCHMIDT-HIEBER, J., AND VAN DER VAART, A. Bayesian linear regression with sparse priors. Ann. Statist. 43, 5 (2015), 1986–2018.
    Google ScholarLocate open access versionFindings
  • [14] CASTILLO, I., AND SZABÓ, B. Spike and slab empirical Bayes sparse credible sets. Bernoulli 26, 1 (2020), 127–158.
    Google ScholarLocate open access versionFindings
  • [15] CASTILLO, I., AND VAN DER VAART, A. Needles and straw in a haystack: posterior concentration for possibly sparse sequences. Ann. Statist. 40, 4 (2012), 2069–2101.
    Google ScholarLocate open access versionFindings
  • [16] CLARA, G., SZABO, B., AND RAY, K. sparsevb: spike and slab variational Bayes for linear and logistic regression, 2020. R package version 1.0.
    Google ScholarFindings
  • [17] GEORGE, E. I., AND MCCULLOCH, R. E. Variable selection via Gibbs sampling. Journal of the American Statistical Association 88, 423 (1993), 881–889.
    Google ScholarLocate open access versionFindings
  • [18] GHORBANI, B., JAVADI, H., AND MONTANARI, A. An instability in variational inference for topic models. arXiv e-prints (2018), arXiv:1802.00568.
    Findings
  • [19] GHOSAL, S., GHOSH, J. K., AND VAN DER VAART, A. W. Convergence rates of posterior distributions. Ann. Statist. 28, 2 (2000), 500–531.
    Google ScholarLocate open access versionFindings
  • [20] GOODRICH, B., GABRY, J., ALI, I., AND BRILLEMAN, S. rstanarm: Bayesian applied regression modeling via Stan., 2020. R package version 2.19.3.
    Google ScholarFindings
  • [21] HOFFMAN, M. D., AND GELMAN, A. The no-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res. 15 (2014), 1593–1623.
    Google ScholarLocate open access versionFindings
  • [22] HORN, R. A., AND JOHNSON, C. R. Matrix analysis, second ed. Cambridge University Press, Cambridge, 2013.
    Google ScholarFindings
  • [23] HUANG, X., WANG, J., AND LIANG, F. A variational algorithm for Bayesian variable selection. arXiv e-prints (2016), arXiv:1602.07640.
    Findings
  • [24] JAAKKOLA, T. S., AND JORDAN, M. I. Bayesian parameter estimation via variational methods. Statistics and Computing 10, 1 (2000), 25–37.
    Google ScholarLocate open access versionFindings
  • [25] LI, Y.-H., SCARLETT, J., RAVIKUMAR, P., AND CEVHER, V. Sparsistency of l1-Regularized M -Estimators. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics (2015), pp. 644–652.
    Google ScholarLocate open access versionFindings
  • [26] LI, Z. R., MCCORMICK, T. H., AND CLARK, S. J. Bayesian joint spike-and-slab graphical lasso. arXiv e-prints (2018), arXiv:1805.07051.
    Findings
  • [27] LIU, D. C., AND NOCEDAL, J. On the limited memory BFGS method for large scale optimization. Mathematical Programming 45 (1989), 503–528.
    Google ScholarLocate open access versionFindings
  • [28] LOGSDON, B. A., HOFFMAN, G. E., AND MEZEY, J. G. A variational Bayes algorithm for fast and accurate multiple locus genome-wide association analysis. BMC bioinformatics 11, 1 (2010), 58.
    Google ScholarLocate open access versionFindings
  • [29] LU, Y., STUART, A., AND WEBER, H. Gaussian approximations for probability measures on Rd. SIAM/ASA J. Uncertain. Quantif. 5, 1 (2017), 1136–1165.
    Google ScholarLocate open access versionFindings
  • [30] MCDERMOTT, P., SNYDER, J., AND WILLISON, R. Methods for Bayesian variable selection with binary response data using the EM algorithm. arXiv e-prints (2016), arXiv:1605.05429.
    Findings
  • [31] MITCHELL, T. J., AND BEAUCHAMP, J. J. Bayesian variable selection in linear regression. J. Amer. Statist. Assoc. 83, 404 (1988), 1023–1036.
    Google ScholarLocate open access versionFindings
  • [32] NARISETTY, N. N., SHEN, J., AND HE, X. Skinny Gibbs: a consistent and scalable Gibbs sampler for model selection. J. Amer. Statist. Assoc. 114, 527 (2019), 1205–1217.
    Google ScholarLocate open access versionFindings
  • [33] NEGAHBAN, S., YU, B., WAINWRIGHT, M. J., AND RAVIKUMAR, P. K. A unified framework for high-dimensional analysis of M -estimators with decomposable regularizers. In Advances in Neural Information Processing Systems 22.
    Google ScholarLocate open access versionFindings
  • [34] NEGAHBAN, S. N., RAVIKUMAR, P., WAINWRIGHT, M. J., AND YU, B. A unified framework for high-dimensional analysis of M -estimators with decomposable regularizers. Statist. Sci. 27, 4 (11 2012), 538–557.
    Google ScholarLocate open access versionFindings
  • [35] NICKL, R., AND RAY, K. Nonparametric statistical inference for drift vector fields of multidimensional diffusions. Ann. Statist. 48, 3 (2020), 1383–1408.
    Google ScholarLocate open access versionFindings
  • [36] ORMEROD, J. T., YOU, C., AND MÜLLER, S. A variational Bayes approach to variable selection. Electron. J. Stat. 11, 2 (2017), 3549–3594.
    Google ScholarLocate open access versionFindings
  • [37] PAISLEY, J. W., BLEI, D. M., AND JORDAN, M. I. Variational Bayesian inference with stochastic search. In ICML (2012), icml.cc / Omnipress.
    Google ScholarLocate open access versionFindings
  • [38] PATI, D., BHATTACHARYA, A., AND YANG, Y. On statistical optimality of variational Bayes. In Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics (2018), pp. 1579–1588.
    Google ScholarLocate open access versionFindings
  • [39] POLSON, N. G., AND ROC KOVÁ, V. Posterior concentration for sparse deep learning. In Advances in Neural Information Processing Systems (2018), pp. 930–941.
    Google ScholarLocate open access versionFindings
  • [40] RAY, K. Adaptive Bernstein–von Mises theorems in Gaussian white noise. Ann. Statist. 45, 6 (2017), 2511–2536.
    Google ScholarLocate open access versionFindings
  • [41] RAY, K., AND SCHMIDT-HIEBER, J. Minimax theory for a class of nonlinear statistical inverse problems. Inverse Problems 32, 6 (2016), 065003, 29.
    Google ScholarLocate open access versionFindings
  • [42] RAY, K., AND SZABO, B. Variational Bayes for high-dimensional linear regression with sparse priors. arXiv e-prints (2019), arXiv:1904.07150.
    Findings
  • [43] SANDERSON, C., AND CURTIN, R. Armadillo: A template-based C++ library for linear algebra. Journal of Open Source Software 1 (07 2016), 26.
    Google ScholarLocate open access versionFindings
  • [44] SCOTT, S. L., AND VARIAN, H. R. Bayesian variable selection for nowcasting economic time series. Tech. rep., National Bureau of Economic Research, 2013.
    Google ScholarLocate open access versionFindings
  • [45] SHETH, R., AND KHARDON, R. Excess risk bounds for the Bayes risk using variational inference in latent Gaussian models. In Advances in Neural Information Processing Systems 30. 2017, pp. 5151–5161.
    Google ScholarLocate open access versionFindings
  • [46] SZABÓ, B., AND VAN ZANTEN, H. An asymptotic analysis of distributed nonparametric methods. Journal of Machine Learning Research 20, 87 (2019), 1–30.
    Google ScholarLocate open access versionFindings
  • [47] TITSIAS, M., AND LAZARO-GREDILLA, M. Doubly stochastic variational Bayes for nonconjugate inference. In Proceedings of the 31st International Conference on Machine Learning (2014), pp. 1971–1979.
    Google ScholarLocate open access versionFindings
  • [48] TITSIAS, M. K., AND LÁZARO-GREDILLA, M. Spike and slab variational inference for multi-task and multiple kernel learning. In Advances in neural information processing systems (2011), pp. 2339–2347.
    Google ScholarLocate open access versionFindings
  • [49] VAN DER PAS, S., SZABÓ, B., AND VAN DER VAART, A. Uncertainty quantification for the horseshoe (with discussion). Bayesian Anal. 12, 4 (2017), 1221–1274. With a rejoinder by the authors.
    Google ScholarLocate open access versionFindings
  • [50] VAN ERVEN, T., AND SZABO, B. Fast exact Bayesian inference for sparse signals in the normal sequence model. Bayesian Anal., to appear (2020).
    Google ScholarLocate open access versionFindings
  • [51] WANG, B., AND TITTERINGTON, D. Convergence and asymptotic normality of variational Bayesian approximations for exponential family models with missing values. In Proceedings of the Twentieth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI04) (2004), pp. 577–584.
    Google ScholarLocate open access versionFindings
  • [52] WANG, B., AND TITTERINGTON, D. M. Inadequacy of interval estimates corresponding to variational Bayesian approximations. In IN AISTATS05 (2004), pp. 373–380.
    Google ScholarLocate open access versionFindings
  • [53] WANG, C., AND BLEI, D. M. Variational inference in nonconjugate models. J. Mach. Learn. Res. 14 (2013), 1005–1031.
    Google ScholarLocate open access versionFindings
  • [54] WANG, Y., AND BLEI, D. Variational Bayes under model misspecification. In Advances in Neural Information Processing Systems 32. 2019, pp. 13357–13367.
    Google ScholarLocate open access versionFindings
  • [55] WANG, Y., AND BLEI, D. M. Frequentist consistency of variational Bayes. J. Amer. Statist. Assoc. 114, 527 (2019), 1147–1161.
    Google ScholarLocate open access versionFindings
  • [56] WEI, R., AND GHOSAL, S. Contraction properties of shrinkage priors in logistic regression. J. Statist. Plann. Inference 207 (2020), 215–229.
    Google ScholarLocate open access versionFindings
  • [57] YANG, Y., PATI, D., AND BHATTACHARYA, A. α-variational inference with statistical guarantees. Ann. Statist. 48, 2 (2020), 886–905.
    Google ScholarLocate open access versionFindings
  • [58] YI, N., TANG, Z., ZHANG, X., AND GUO, B. BhGLM: Bayesian hierarchical GLMs and survival models, with applications to genomics and epidemiology. Bioinformatics 35 8 (2019), 1419–1421.
    Google ScholarLocate open access versionFindings
  • [59] ZHANG, A. Y., AND ZHOU, H. H. Theoretical and computational guarantees of mean field variational inference for community detection. Ann. Statist. 48, 5 (2020), 2575–2598.
    Google ScholarLocate open access versionFindings
  • [60] ZHANG, C.-X., XU, S., AND ZHANG, J.-S. A novel variational Bayesian method for variable selection in logistic regression models. Comput. Statist. Data Anal. 133 (2019), 1–19.
    Google ScholarLocate open access versionFindings
  • [61] ZHANG, F., AND GAO, C. Convergence rates of variational posterior distributions. Ann. Statist. 48, 4 (2020), 2180–2207.
    Google ScholarLocate open access versionFindings
  • [62] ZHAO, P., AND YU, B. On model selection consistency of Lasso. J. Mach. Learn. Res. 7 (2006), 2541–2563.
    Google ScholarLocate open access versionFindings
Author
Kolyan Ray
Kolyan Ray
Botond Szabo
Botond Szabo
Gabriel Clara
Gabriel Clara
Your rating :
0

 

Tags
Comments
小科