## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# Minimax Bounds for Generalized Linear Models

NIPS 2020, (2020)

EI

Abstract

We establish a new class of minimax prediction error bounds for generalized linear models. Our bounds significantly improve previous results when the design matrix is poorly structured, including natural cases where the matrix is wide or does not have full column rank. Apart from the typical L2 risks, we study a class of entropic risks wh...More

Code:

Data:

Introduction

- Throughout, the authors consider a parametric framework where observations X ∈ Rn are generated according to X ∼ Pθ, where Pθ denotes a probability measure on a measurable space (X ⊆ Rn, F ) indexed by an underlying parameter θ ∈ Θ ⊂ Rd.
- 1. First, the authors establish L2 minimax risk and entropic Bayes risk bounds for the generalized linear model (2).
- 2. Second, the authors establish L2 minimax risk and entropic Bayes risk bounds for the Gaussian linear model.

Highlights

- Throughout, we consider a parametric framework where observations X ∈ Rn are generated according to X ∼ Pθ, where Pθ denotes a probability measure on a measurable space (X ⊆ Rn, F ) indexed by an underlying parameter θ ∈ Θ ⊂ Rd
- We focus on L2 loss in the present work, we remark that minimax bounds on entropic loss directly yield corresponding estimates on Lp loss using standard arguments involving covering and packing numbers of Lp spaces
- We establish L2 minimax risk and entropic Bayes risk bounds for the generalized linear model (2)
- Our first main result establishes a minimax prediction lower bound corresponding to the generalized linear model (2)
- For observations X ∈ Rn generated via the generalized linear model (2) with a fixed design matrix M ∈ Rn×d, the minimax L2 prediction risk and the entropic Bayes prediction risk are lower bounded by n inf θsup θ∈Rd
- Our bounds significantly improve previous results when the design matrix is poorly structured, including natural cases where the matrix is wide or does not have full column rank
- For observations X ∈ Rn generated via the generalized linear model (2), with the additional constraint that θ 0 ≤ k (i.e., Θ := {θ ∈ Rd : θ 0 ≤ k}), the minimax prediction error is lower bounded by n inf θsup θ∈Θ

Results

- The authors show that both the minimax risk and entropic Bayes risk bounds are tight up to constants and log factors when M is sampled from a Gaussian ensemble.
- The authors' first main result establishes a minimax prediction lower bound corresponding to the generalized linear model (2).
- For observations X ∈ Rn generated via the generalized linear model (2) with a fixed design matrix M ∈ Rn×d, the minimax L2 prediction risk and the entropic Bayes prediction risk are lower bounded by n inf θsup θ∈Rd
- For observations X ∈ Rn generated via the generalized linear model (2), with the additional constraint that θ 0 ≤ k (i.e., Θ := {θ ∈ Rd : θ 0 ≤ k}), the minimax prediction error is lower bounded by n inf θsup θ∈Θ
- For observations X ∈ Rn generated via the Gaussian linear model (8), with the sparsity constraint θ 0 ≤ k (i.e., Θ := {θ ∈ Rd : θ 0 ≤ k}), the minimax prediction error is lower bounded by n inf θsup θ∈Θ
- Most relevant to the results is the following lower bound on minimax L2 estimation risk and entropic Bayes estimation risk, developed in a recent work by Lee and Courtade [23].
- There is a large body of work that establish minimax lower bounds on prediction error for specific models of the generalized linear model.
- A popular minimax result is due to Raskutti et al [18], who consider the sparse Gaussian linear model, where for a fixed design matrix M with an additional sparsity constraint θ 0 ≤ k, σ2 Φ2k,−(M ) k log ed

Conclusion

- If the parameter θ has a prior π that is log-concave, the following lemma gives an upper bound on the mutual information I(θ; X), which depends on the covariance matrix of θ, defined as Cov(θ).
- 3.3 An Alternative Proof of Theorem 5 For the Gaussian linear model, the authors have the following tighter version of Lemma 11.
- While many previous approaches have focused on the Gaussian linear model, in this paper the authors establish minimax and Bayes risk lower bounds that hold uniformly over all statistical models within the GLM.

Summary

- Throughout, the authors consider a parametric framework where observations X ∈ Rn are generated according to X ∼ Pθ, where Pθ denotes a probability measure on a measurable space (X ⊆ Rn, F ) indexed by an underlying parameter θ ∈ Θ ⊂ Rd.
- 1. First, the authors establish L2 minimax risk and entropic Bayes risk bounds for the generalized linear model (2).
- 2. Second, the authors establish L2 minimax risk and entropic Bayes risk bounds for the Gaussian linear model.
- The authors show that both the minimax risk and entropic Bayes risk bounds are tight up to constants and log factors when M is sampled from a Gaussian ensemble.
- The authors' first main result establishes a minimax prediction lower bound corresponding to the generalized linear model (2).
- For observations X ∈ Rn generated via the generalized linear model (2) with a fixed design matrix M ∈ Rn×d, the minimax L2 prediction risk and the entropic Bayes prediction risk are lower bounded by n inf θsup θ∈Rd
- For observations X ∈ Rn generated via the generalized linear model (2), with the additional constraint that θ 0 ≤ k (i.e., Θ := {θ ∈ Rd : θ 0 ≤ k}), the minimax prediction error is lower bounded by n inf θsup θ∈Θ
- For observations X ∈ Rn generated via the Gaussian linear model (8), with the sparsity constraint θ 0 ≤ k (i.e., Θ := {θ ∈ Rd : θ 0 ≤ k}), the minimax prediction error is lower bounded by n inf θsup θ∈Θ
- Most relevant to the results is the following lower bound on minimax L2 estimation risk and entropic Bayes estimation risk, developed in a recent work by Lee and Courtade [23].
- There is a large body of work that establish minimax lower bounds on prediction error for specific models of the generalized linear model.
- A popular minimax result is due to Raskutti et al [18], who consider the sparse Gaussian linear model, where for a fixed design matrix M with an additional sparsity constraint θ 0 ≤ k, σ2 Φ2k,−(M ) k log ed
- If the parameter θ has a prior π that is log-concave, the following lemma gives an upper bound on the mutual information I(θ; X), which depends on the covariance matrix of θ, defined as Cov(θ).
- 3.3 An Alternative Proof of Theorem 5 For the Gaussian linear model, the authors have the following tighter version of Lemma 11.
- While many previous approaches have focused on the Gaussian linear model, in this paper the authors establish minimax and Bayes risk lower bounds that hold uniformly over all statistical models within the GLM.

- Table1: Values of identities in Γ(M ) (defined in (7)) for different scenarios of ΛM = (λ1, . . . , λd) for fixed M ∈ Rn×d. The value t satisfies t 1. In each row, the bold item marks the largest value

Related work

- Most relevant to our results is the following lower bound on minimax L2 estimation risk and entropic Bayes estimation risk, developed in a recent work by Lee and Courtade [23]. We note that [23] does not bound prediction loss (which is often of primary interest), as we have done in the present paper.

Theorem 7 (Theorem 3, [23]). Let observation X be generated via the generalized linear model defined in (2), with the additional structural constraint Θ = Bd2(R) := {v : v ≤ R2}.

Suppose the cumulant function Φ satisfies Φ ≤ L for some constant L. Then, the minimax estimation error is lower bounded by inf sup E θ − θ 2

Funding

- This work was supported in part by NSF grants CCF-1704967, CCF-1750430, CCF-0939370

Reference

- C. M. Stein, “Estimation of The Mean of a Multivariate Normal Distribution,” The Annals of Statistics, pp. 1135–1151, 1981.
- J. Friedman, T. Hastie, and R. Tibshirani, “Sparse Inverse Covariance Estimation with the Graphical Lasso,” Biostatistics, vol. 9, no. 3, pp. 432–441, 2008.
- G. Lecué and S. Mendelson, “Minimax Rate of Convergence and the Performance of ERM in Phase Recovery,” arXiv preprint arXiv:1311.5024, 2013.
- T. T. Cai, X. Li, Z. Ma, et al., “Optimal Rates of Convergence for Noisy Sparse Phase Retrieval via Thresholded Wirtinger Flow,” The Annals of Statistics, vol. 44, no. 5, pp. 2221–2251, 2016.
- D. Du, F. K. Hwang, and F. Hwang, Combinatorial Group Testing and Its Applications, vol.
- 12. World Scientific, 2000.
- [6] B. Hajek, S. Oh, and J. Xu, “Minimax-Optimal Inference from Partial Rankings,” in Advances in Neural Information Processing Systems, pp. 1475–1483, 2014.
- [7] P. Tichavsky, C. H. Muravchik, and A. Nehorai, “Posterior Cramér-Rao Bounds for DiscreteTime Nonlinear Filtering,” IEEE Transactions on signal processing, vol. 46, no. 5, pp. 1386– 1396, 1998.
- [8] L. Paninski, “Convergence Properties of Some Spike-Triggered Analysis Techniques,” in Advances in neural information processing systems, pp. 189–196, 2003.
- [9] J. Broder and P. Rusmevichientong, “Dynamic Pricing Under a General Parametric Choice Model,” Operations Research, vol. 60, no. 4, pp. 965–980, 2012.
- [10] N. B. Shah, S. Balakrishnan, J. Bradley, A. Parekh, K. Ramchandran, and M. J. Wainwright, “Estimation from Pairwise Comparisons: Sharp Minimax Bounds with Topology Dependence,” The Journal of Machine Learning Research, vol. 17, no. 1, pp. 2049–2095, 2016.
- [11] P. McCullagh, Generalized Linear Models. Routledge, 2019.
- [12] J. A. Nelder and R. W. Wedderburn, “Generalized Linear Models,” Journal of the Royal Statistical Society: Series A (General), vol. 135, no. 3, pp. 370–384, 1972.
- [13] A. J. Dobson and A. G. Barnett, An Introduction to Generalized Linear Models. CRC press, 2018.
- [14] N. Cesa-Bianchi and G. Lugosi, Prediction, Learning, and Games. Cambridge university press, 2006.
- [15] T. A. Courtade and R. D. Wesel, “Multiterminal Source Coding with an Entropy-Based Distortion Measure,” in 2011 IEEE International Symposium on Information Theory Proceedings, pp. 2040–2044, IEEE, 2011.
- [16] J. Jiao, T. A. Courtade, K. Venkat, and T. Weissman, “Justification of Logarithmic Loss Via the Benefit of Side Information,” IEEE Transactions on Information Theory, vol. 61, no. 10, pp. 5357–5365, 2015.
- [17] Y. Wu, “Lecture Notes for Information-Theoretic Methods for High-Dimensional Statistics,” Lecture Notes for ECE598YW (UIUC), vol. 16, 2017.
- [18] G. Raskutti, M. J. Wainwright, and B. Yu, “Minimax Rates of Estimation for High-Dimensional Linear Regression Over lq-Balls,” IEEE Transactions on Information Theory, vol. 57, no. 10, pp. 6976–6994, 2011.
- [19] F. Abramovich and V. Grinshtein, “Model Selection and Minimax Estimation in Generalized Linear Models,” IEEE Transactions on Information Theory, vol. 62, no. 6, pp. 3721–3730, 2016.
- [20] H.-G. Müller and U. Stadtmüller, “Generalized Functional Linear Models,” the Annals of Statistics, vol. 33, no. 2, pp. 774–805, 2005.
- [21] S. N. Negahban, P. Ravikumar, M. J. Wainwright, and B. Yu, “A Unified Framework for HighDimensional Analysis of M-estimators with Decomposable Regularizers,” Statistical Science, vol. 27, no. 4, pp. 538–557, 2012.
- [22] P.-L. Loh and M. J. Wainwright, “Regularized M-estimators with Nonconvexity: Statistical and Algorithmic Theory for Local Optima,” The Journal of Machine Learning Research, vol. 16, no. 1, pp. 559–616, 2015.
- [23] K.-Y. Lee and T. A. Courtade, “Linear Models are Most Favorable among Generalized Linear Models,” arXiv preprint, to appear in ISIT 2020.
- [24] E. Aras, K.-Y. Lee, A. Pananjady, and T. A. Courtade, “A Family of Bayesian Cramér-Rao Bounds, and Consequences for Log-Concave Priors,” in 2019 IEEE International Symposium on Information Theory (ISIT), pp. 2699–2703, IEEE, 2019.
- [25] X. Chen, A. Guntuboyina, and Y. Zhang, “On Bayes Risk Lower Bounds,” The Journal of Machine Learning Research, vol. 17, no. 1, pp. 7687–7744, 2016.
- [26] M. J. Wainwright, High-Dimensional Statistics: A Non-Asymptotic Viewpoint, vol.
- 48. Cambridge University Press, 2019.
- [27] T. T. Cai, C.-H. Zhang, H. H. Zhou, et al., “Optimal Rates of Convergence for Covariance Matrix Estimation,” The Annals of Statistics, vol. 38, no. 4, pp. 2118–2144, 2010.
- [28] E. J. Candes and Y. Plan, “Tight Oracle Inequalities for Low-Rank Matrix Recovery From a Minimal Number of Noisy Random Measurements,” IEEE Transactions on Information Theory, vol. 57, no. 4, pp. 2342–2359, 2011.
- [29] F. Bunea, A. B. Tsybakov, M. H. Wegkamp, et al., “Aggregation for Gaussian Regression,” The Annals of Statistics, vol. 35, no. 4, pp. 1674–1697, 2007.
- [30] N. Verzelen, “Minimax Risks for Sparse Regressions: Ultra-High Dimensional Phenomenons,” Electronic Journal of Statistics, vol. 6, pp. 38–90, 2012.
- [31] L. Birgé and P. Massart, “Gaussian Model Selection,” Journal of the European Mathematical Society, vol. 3, no. 3, pp. 203–268, 2001.
- [32] L. Birgé and P. Massart, “Minimal Penalties for Gaussian Model Selection,” Probability theory and related fields, vol. 138, no. 1-2, pp. 33–73, 2007.
- [33] M. F. Duarte and Y. C. Eldar, “Structured Compressed Sensing: From Theory to Applications,” IEEE Transactions on signal processing, vol. 59, no. 9, pp. 4053–4085, 2011.
- [34] G. Raskutti, M. J. Wainwright, and B. Yu, “Restricted Eigenvalue Properties for Correlated Gaussian Designs,” Journal of Machine Learning Research, vol. 11, no. Aug, pp. 2241–2259, 2010.
- [35] A. Javanmard, A. Montanari, et al., “Debiasing the Lasso: Optimal Sample Size for Gaussian Designs,” The Annals of Statistics, vol. 46, no. 6A, pp. 2593–2622, 2018.
- [36] Z.-D. Bai and Y.-Q. Yin, “Limit of the Smallest Eigenvalue of a Large Dimensional Sample Covariance Matrix,” in Advances In Statistics, pp. 108–127, World Scientific, 2008.
- [37] M. Rudelson and R. Vershynin, “The Least Singular Value of a Random Square Matrix is o(n−1/2),” arXiv preprint arXiv:0805.3407, 2008.
- [38] S. J. Szarek, “Spaces with Large Distance l∞ n and Random Matrices,” American Journal of Mathematics, vol. 112, no. 6, pp. 899–942, 1990.
- [39] R. C. Thompson, “Principal Submatrices IX: Interlacing Inequalities for Singular Values of Submatrices,” Linear Algebra and its Applications, vol. 5, no. 1, pp. 1–12, 1972.
- [40] F. Wei, “Upper Bound for Intermediate Singular Values of Random Matrices,” Journal of Mathematical Analysis and Applications, vol. 445, no. 2, pp. 1530–1547, 2017.
- [41] R. J. Hanson and C. L. Lawson, “Extensions and Applications of the Householder Algorithm for Solving Linear Least Squares Problems,” Mathematics of Computation, pp. 787–812, 1969.

Tags

Comments

数据免责声明

页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果，我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问，可以通过电子邮件方式联系我们：report@aminer.cn