AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
For observations X ∈ Rn generated via the generalized linear model with a fixed design matrix M ∈ Rn×d, the minimax L2 prediction risk and the entropic Bayes prediction risk are lower bounded by n inf θsup θ∈Rd

Minimax Bounds for Generalized Linear Models

NIPS 2020, (2020)

Cited by: 0|Views9
EI
Full Text
Bibtex
Weibo

Abstract

We establish a new class of minimax prediction error bounds for generalized linear models. Our bounds significantly improve previous results when the design matrix is poorly structured, including natural cases where the matrix is wide or does not have full column rank. Apart from the typical L2 risks, we study a class of entropic risks wh...More

Code:

Data:

Introduction
  • Throughout, the authors consider a parametric framework where observations X ∈ Rn are generated according to X ∼ Pθ, where Pθ denotes a probability measure on a measurable space (X ⊆ Rn, F ) indexed by an underlying parameter θ ∈ Θ ⊂ Rd.
  • 1. First, the authors establish L2 minimax risk and entropic Bayes risk bounds for the generalized linear model (2).
  • 2. Second, the authors establish L2 minimax risk and entropic Bayes risk bounds for the Gaussian linear model.
Highlights
  • Throughout, we consider a parametric framework where observations X ∈ Rn are generated according to X ∼ Pθ, where Pθ denotes a probability measure on a measurable space (X ⊆ Rn, F ) indexed by an underlying parameter θ ∈ Θ ⊂ Rd
  • We focus on L2 loss in the present work, we remark that minimax bounds on entropic loss directly yield corresponding estimates on Lp loss using standard arguments involving covering and packing numbers of Lp spaces
  • We establish L2 minimax risk and entropic Bayes risk bounds for the generalized linear model (2)
  • Our first main result establishes a minimax prediction lower bound corresponding to the generalized linear model (2)
  • For observations X ∈ Rn generated via the generalized linear model (2) with a fixed design matrix M ∈ Rn×d, the minimax L2 prediction risk and the entropic Bayes prediction risk are lower bounded by n inf θsup θ∈Rd
  • Our bounds significantly improve previous results when the design matrix is poorly structured, including natural cases where the matrix is wide or does not have full column rank
  • For observations X ∈ Rn generated via the generalized linear model (2), with the additional constraint that θ 0 ≤ k (i.e., Θ := {θ ∈ Rd : θ 0 ≤ k}), the minimax prediction error is lower bounded by n inf θsup θ∈Θ
Results
  • The authors show that both the minimax risk and entropic Bayes risk bounds are tight up to constants and log factors when M is sampled from a Gaussian ensemble.
  • The authors' first main result establishes a minimax prediction lower bound corresponding to the generalized linear model (2).
  • For observations X ∈ Rn generated via the generalized linear model (2) with a fixed design matrix M ∈ Rn×d, the minimax L2 prediction risk and the entropic Bayes prediction risk are lower bounded by n inf θsup θ∈Rd
  • For observations X ∈ Rn generated via the generalized linear model (2), with the additional constraint that θ 0 ≤ k (i.e., Θ := {θ ∈ Rd : θ 0 ≤ k}), the minimax prediction error is lower bounded by n inf θsup θ∈Θ
  • For observations X ∈ Rn generated via the Gaussian linear model (8), with the sparsity constraint θ 0 ≤ k (i.e., Θ := {θ ∈ Rd : θ 0 ≤ k}), the minimax prediction error is lower bounded by n inf θsup θ∈Θ
  • Most relevant to the results is the following lower bound on minimax L2 estimation risk and entropic Bayes estimation risk, developed in a recent work by Lee and Courtade [23].
  • There is a large body of work that establish minimax lower bounds on prediction error for specific models of the generalized linear model.
  • A popular minimax result is due to Raskutti et al [18], who consider the sparse Gaussian linear model, where for a fixed design matrix M with an additional sparsity constraint θ 0 ≤ k, σ2 Φ2k,−(M ) k log ed
Conclusion
  • If the parameter θ has a prior π that is log-concave, the following lemma gives an upper bound on the mutual information I(θ; X), which depends on the covariance matrix of θ, defined as Cov(θ).
  • 3.3 An Alternative Proof of Theorem 5 For the Gaussian linear model, the authors have the following tighter version of Lemma 11.
  • While many previous approaches have focused on the Gaussian linear model, in this paper the authors establish minimax and Bayes risk lower bounds that hold uniformly over all statistical models within the GLM.
Summary
  • Throughout, the authors consider a parametric framework where observations X ∈ Rn are generated according to X ∼ Pθ, where Pθ denotes a probability measure on a measurable space (X ⊆ Rn, F ) indexed by an underlying parameter θ ∈ Θ ⊂ Rd.
  • 1. First, the authors establish L2 minimax risk and entropic Bayes risk bounds for the generalized linear model (2).
  • 2. Second, the authors establish L2 minimax risk and entropic Bayes risk bounds for the Gaussian linear model.
  • The authors show that both the minimax risk and entropic Bayes risk bounds are tight up to constants and log factors when M is sampled from a Gaussian ensemble.
  • The authors' first main result establishes a minimax prediction lower bound corresponding to the generalized linear model (2).
  • For observations X ∈ Rn generated via the generalized linear model (2) with a fixed design matrix M ∈ Rn×d, the minimax L2 prediction risk and the entropic Bayes prediction risk are lower bounded by n inf θsup θ∈Rd
  • For observations X ∈ Rn generated via the generalized linear model (2), with the additional constraint that θ 0 ≤ k (i.e., Θ := {θ ∈ Rd : θ 0 ≤ k}), the minimax prediction error is lower bounded by n inf θsup θ∈Θ
  • For observations X ∈ Rn generated via the Gaussian linear model (8), with the sparsity constraint θ 0 ≤ k (i.e., Θ := {θ ∈ Rd : θ 0 ≤ k}), the minimax prediction error is lower bounded by n inf θsup θ∈Θ
  • Most relevant to the results is the following lower bound on minimax L2 estimation risk and entropic Bayes estimation risk, developed in a recent work by Lee and Courtade [23].
  • There is a large body of work that establish minimax lower bounds on prediction error for specific models of the generalized linear model.
  • A popular minimax result is due to Raskutti et al [18], who consider the sparse Gaussian linear model, where for a fixed design matrix M with an additional sparsity constraint θ 0 ≤ k, σ2 Φ2k,−(M ) k log ed
  • If the parameter θ has a prior π that is log-concave, the following lemma gives an upper bound on the mutual information I(θ; X), which depends on the covariance matrix of θ, defined as Cov(θ).
  • 3.3 An Alternative Proof of Theorem 5 For the Gaussian linear model, the authors have the following tighter version of Lemma 11.
  • While many previous approaches have focused on the Gaussian linear model, in this paper the authors establish minimax and Bayes risk lower bounds that hold uniformly over all statistical models within the GLM.
Tables
  • Table1: Values of identities in Γ(M ) (defined in (7)) for different scenarios of ΛM = (λ1, . . . , λd) for fixed M ∈ Rn×d. The value t satisfies t 1. In each row, the bold item marks the largest value
Download tables as Excel
Related work
  • Most relevant to our results is the following lower bound on minimax L2 estimation risk and entropic Bayes estimation risk, developed in a recent work by Lee and Courtade [23]. We note that [23] does not bound prediction loss (which is often of primary interest), as we have done in the present paper.

    Theorem 7 (Theorem 3, [23]). Let observation X be generated via the generalized linear model defined in (2), with the additional structural constraint Θ = Bd2(R) := {v : v ≤ R2}.

    Suppose the cumulant function Φ satisfies Φ ≤ L for some constant L. Then, the minimax estimation error is lower bounded by inf sup E θ − θ 2
Funding
  • This work was supported in part by NSF grants CCF-1704967, CCF-1750430, CCF-0939370
Reference
  • C. M. Stein, “Estimation of The Mean of a Multivariate Normal Distribution,” The Annals of Statistics, pp. 1135–1151, 1981.
    Google ScholarLocate open access versionFindings
  • J. Friedman, T. Hastie, and R. Tibshirani, “Sparse Inverse Covariance Estimation with the Graphical Lasso,” Biostatistics, vol. 9, no. 3, pp. 432–441, 2008.
    Google ScholarLocate open access versionFindings
  • G. Lecué and S. Mendelson, “Minimax Rate of Convergence and the Performance of ERM in Phase Recovery,” arXiv preprint arXiv:1311.5024, 2013.
    Findings
  • T. T. Cai, X. Li, Z. Ma, et al., “Optimal Rates of Convergence for Noisy Sparse Phase Retrieval via Thresholded Wirtinger Flow,” The Annals of Statistics, vol. 44, no. 5, pp. 2221–2251, 2016.
    Google ScholarLocate open access versionFindings
  • D. Du, F. K. Hwang, and F. Hwang, Combinatorial Group Testing and Its Applications, vol.
    Google ScholarLocate open access versionFindings
  • 12. World Scientific, 2000.
    Google ScholarFindings
  • [6] B. Hajek, S. Oh, and J. Xu, “Minimax-Optimal Inference from Partial Rankings,” in Advances in Neural Information Processing Systems, pp. 1475–1483, 2014.
    Google ScholarLocate open access versionFindings
  • [7] P. Tichavsky, C. H. Muravchik, and A. Nehorai, “Posterior Cramér-Rao Bounds for DiscreteTime Nonlinear Filtering,” IEEE Transactions on signal processing, vol. 46, no. 5, pp. 1386– 1396, 1998.
    Google ScholarLocate open access versionFindings
  • [8] L. Paninski, “Convergence Properties of Some Spike-Triggered Analysis Techniques,” in Advances in neural information processing systems, pp. 189–196, 2003.
    Google ScholarFindings
  • [9] J. Broder and P. Rusmevichientong, “Dynamic Pricing Under a General Parametric Choice Model,” Operations Research, vol. 60, no. 4, pp. 965–980, 2012.
    Google ScholarLocate open access versionFindings
  • [10] N. B. Shah, S. Balakrishnan, J. Bradley, A. Parekh, K. Ramchandran, and M. J. Wainwright, “Estimation from Pairwise Comparisons: Sharp Minimax Bounds with Topology Dependence,” The Journal of Machine Learning Research, vol. 17, no. 1, pp. 2049–2095, 2016.
    Google ScholarLocate open access versionFindings
  • [11] P. McCullagh, Generalized Linear Models. Routledge, 2019.
    Google ScholarFindings
  • [12] J. A. Nelder and R. W. Wedderburn, “Generalized Linear Models,” Journal of the Royal Statistical Society: Series A (General), vol. 135, no. 3, pp. 370–384, 1972.
    Google ScholarLocate open access versionFindings
  • [13] A. J. Dobson and A. G. Barnett, An Introduction to Generalized Linear Models. CRC press, 2018.
    Google ScholarFindings
  • [14] N. Cesa-Bianchi and G. Lugosi, Prediction, Learning, and Games. Cambridge university press, 2006.
    Google ScholarFindings
  • [15] T. A. Courtade and R. D. Wesel, “Multiterminal Source Coding with an Entropy-Based Distortion Measure,” in 2011 IEEE International Symposium on Information Theory Proceedings, pp. 2040–2044, IEEE, 2011.
    Google ScholarLocate open access versionFindings
  • [16] J. Jiao, T. A. Courtade, K. Venkat, and T. Weissman, “Justification of Logarithmic Loss Via the Benefit of Side Information,” IEEE Transactions on Information Theory, vol. 61, no. 10, pp. 5357–5365, 2015.
    Google ScholarLocate open access versionFindings
  • [17] Y. Wu, “Lecture Notes for Information-Theoretic Methods for High-Dimensional Statistics,” Lecture Notes for ECE598YW (UIUC), vol. 16, 2017.
    Google ScholarLocate open access versionFindings
  • [18] G. Raskutti, M. J. Wainwright, and B. Yu, “Minimax Rates of Estimation for High-Dimensional Linear Regression Over lq-Balls,” IEEE Transactions on Information Theory, vol. 57, no. 10, pp. 6976–6994, 2011.
    Google ScholarLocate open access versionFindings
  • [19] F. Abramovich and V. Grinshtein, “Model Selection and Minimax Estimation in Generalized Linear Models,” IEEE Transactions on Information Theory, vol. 62, no. 6, pp. 3721–3730, 2016.
    Google ScholarLocate open access versionFindings
  • [20] H.-G. Müller and U. Stadtmüller, “Generalized Functional Linear Models,” the Annals of Statistics, vol. 33, no. 2, pp. 774–805, 2005.
    Google ScholarLocate open access versionFindings
  • [21] S. N. Negahban, P. Ravikumar, M. J. Wainwright, and B. Yu, “A Unified Framework for HighDimensional Analysis of M-estimators with Decomposable Regularizers,” Statistical Science, vol. 27, no. 4, pp. 538–557, 2012.
    Google ScholarLocate open access versionFindings
  • [22] P.-L. Loh and M. J. Wainwright, “Regularized M-estimators with Nonconvexity: Statistical and Algorithmic Theory for Local Optima,” The Journal of Machine Learning Research, vol. 16, no. 1, pp. 559–616, 2015.
    Google ScholarLocate open access versionFindings
  • [23] K.-Y. Lee and T. A. Courtade, “Linear Models are Most Favorable among Generalized Linear Models,” arXiv preprint, to appear in ISIT 2020.
    Google ScholarFindings
  • [24] E. Aras, K.-Y. Lee, A. Pananjady, and T. A. Courtade, “A Family of Bayesian Cramér-Rao Bounds, and Consequences for Log-Concave Priors,” in 2019 IEEE International Symposium on Information Theory (ISIT), pp. 2699–2703, IEEE, 2019.
    Google ScholarLocate open access versionFindings
  • [25] X. Chen, A. Guntuboyina, and Y. Zhang, “On Bayes Risk Lower Bounds,” The Journal of Machine Learning Research, vol. 17, no. 1, pp. 7687–7744, 2016.
    Google ScholarLocate open access versionFindings
  • [26] M. J. Wainwright, High-Dimensional Statistics: A Non-Asymptotic Viewpoint, vol.
    Google ScholarLocate open access versionFindings
  • 48. Cambridge University Press, 2019.
    Google ScholarFindings
  • [27] T. T. Cai, C.-H. Zhang, H. H. Zhou, et al., “Optimal Rates of Convergence for Covariance Matrix Estimation,” The Annals of Statistics, vol. 38, no. 4, pp. 2118–2144, 2010.
    Google ScholarLocate open access versionFindings
  • [28] E. J. Candes and Y. Plan, “Tight Oracle Inequalities for Low-Rank Matrix Recovery From a Minimal Number of Noisy Random Measurements,” IEEE Transactions on Information Theory, vol. 57, no. 4, pp. 2342–2359, 2011.
    Google ScholarLocate open access versionFindings
  • [29] F. Bunea, A. B. Tsybakov, M. H. Wegkamp, et al., “Aggregation for Gaussian Regression,” The Annals of Statistics, vol. 35, no. 4, pp. 1674–1697, 2007.
    Google ScholarLocate open access versionFindings
  • [30] N. Verzelen, “Minimax Risks for Sparse Regressions: Ultra-High Dimensional Phenomenons,” Electronic Journal of Statistics, vol. 6, pp. 38–90, 2012.
    Google ScholarLocate open access versionFindings
  • [31] L. Birgé and P. Massart, “Gaussian Model Selection,” Journal of the European Mathematical Society, vol. 3, no. 3, pp. 203–268, 2001.
    Google ScholarLocate open access versionFindings
  • [32] L. Birgé and P. Massart, “Minimal Penalties for Gaussian Model Selection,” Probability theory and related fields, vol. 138, no. 1-2, pp. 33–73, 2007.
    Google ScholarLocate open access versionFindings
  • [33] M. F. Duarte and Y. C. Eldar, “Structured Compressed Sensing: From Theory to Applications,” IEEE Transactions on signal processing, vol. 59, no. 9, pp. 4053–4085, 2011.
    Google ScholarLocate open access versionFindings
  • [34] G. Raskutti, M. J. Wainwright, and B. Yu, “Restricted Eigenvalue Properties for Correlated Gaussian Designs,” Journal of Machine Learning Research, vol. 11, no. Aug, pp. 2241–2259, 2010.
    Google ScholarLocate open access versionFindings
  • [35] A. Javanmard, A. Montanari, et al., “Debiasing the Lasso: Optimal Sample Size for Gaussian Designs,” The Annals of Statistics, vol. 46, no. 6A, pp. 2593–2622, 2018.
    Google ScholarLocate open access versionFindings
  • [36] Z.-D. Bai and Y.-Q. Yin, “Limit of the Smallest Eigenvalue of a Large Dimensional Sample Covariance Matrix,” in Advances In Statistics, pp. 108–127, World Scientific, 2008.
    Google ScholarLocate open access versionFindings
  • [37] M. Rudelson and R. Vershynin, “The Least Singular Value of a Random Square Matrix is o(n−1/2),” arXiv preprint arXiv:0805.3407, 2008.
    Findings
  • [38] S. J. Szarek, “Spaces with Large Distance l∞ n and Random Matrices,” American Journal of Mathematics, vol. 112, no. 6, pp. 899–942, 1990.
    Google ScholarLocate open access versionFindings
  • [39] R. C. Thompson, “Principal Submatrices IX: Interlacing Inequalities for Singular Values of Submatrices,” Linear Algebra and its Applications, vol. 5, no. 1, pp. 1–12, 1972.
    Google ScholarLocate open access versionFindings
  • [40] F. Wei, “Upper Bound for Intermediate Singular Values of Random Matrices,” Journal of Mathematical Analysis and Applications, vol. 445, no. 2, pp. 1530–1547, 2017.
    Google ScholarLocate open access versionFindings
  • [41] R. J. Hanson and C. L. Lawson, “Extensions and Applications of the Householder Algorithm for Solving Linear Least Squares Problems,” Mathematics of Computation, pp. 787–812, 1969.
    Google ScholarLocate open access versionFindings
Author
Kuan-Yun Lee
Kuan-Yun Lee
Thomas Courtade
Thomas Courtade
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科