## AI帮你理解科学

## AI 精读

AI抽取本论文的概要总结

微博一下：

# Local Maxima in the Likelihood of Gaussian Mixture Models: Structural Results and Algorithmic Consequences.

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), (2016): 4116-4124

EI

摘要

We provide two fundamental results on the population (infinite-sample) likelihood function of Gaussian mixture models with M >= 3 components. Our first main result shows that the population likelihood function has bad local maxima even in the special case of equally-weighted mixtures of well-separated and spherical Gaussians. We prove tha...更多

代码：

数据：

简介

- Finite mixture models are widely used in variety of statistical settings, as models for heterogeneous populations, as flexible models for multivariate density estimation and as models for clustering.
- Their ability to model data as arising from underlying subpopulations provides essential flexibility in a wide range of applications Titterington [1985]
- This combinatorial structure creates challenges for statistical and computational theory, and there are many problems associated with estimation of finite mixtures that are still open.
- The authors focus on the idealized situation in which every mixture component is weighted, and the covariance of each mixture component is the identity
- This leads to a mixture model of the form p(x | μ∗) := 1 M φ(x | μ∗i , I),

重点内容

- Finite mixture models are widely used in variety of statistical settings, as models for heterogeneous populations, as flexible models for multivariate density estimation and as models for clustering
- Their ability to model data as arising from underlying subpopulations provides essential flexibility in a wide range of applications Titterington [1985]. This combinatorial structure creates challenges for statistical and computational theory, and there are many problems associated with estimation of finite mixtures that are still open. These problems are often studied in the setting of Gaussian mixture models (GMMs), reflecting the wide use of Gaussian mixture models in applications, particular in the multivariate setting, and this setting will be our focus in the current paper
- We further prove that under the same random initialization scheme, the first-order EM algorithm with a suitable stepsize does not converge to a strict saddle point with probability one
- In Section 2, we introduce Gaussian mixture models, the EM algorithm, its first-order variant and we formally set up the problem we consider
- We further provided some evidence that even in this favorable setting random initialization schemes for the population EM algorithm are likely to fail with high probability
- We believe that at least three mixture components are necessary for the log-likelihood to be poorly behaved, and that for a well-separated mixture of two Gaussians the EM algorithm with a random initialization is successful with high probability

结论

**Conclusion and open problems**

In this paper, the authors resolved an open problem of Srebro [2007], by demonstrating the existence of arbitrarily bad local maxima for the population log-likelihood of Gaussian mixture model, even in the idealized situation where each component is uniformly weighted, spherical with unit variance, and well-separated.- The authors further provided some evidence that even in this favorable setting random initialization schemes for the population EM algorithm are likely to fail with high probability.
- The authors believe that at least three mixture components are necessary for the log-likelihood to be poorly behaved, and that for a well-separated mixture of two Gaussians the EM algorithm with a random initialization is successful with high probability.

基金

- This work was partially supported by Office of Naval Research MURI grant DOD-002888, Air Force Office of Scientific Research Grant AFOSR-FA9550-14-1-001, the Mathematical Data Science program of the Office of Naval Research under grant number N00014-15-1-2670, and National Science Foundation Grant CIF-31712-23800

引用论文

- Elizabeth S Allman, Catherine Matias, and John A Rhodes. Identifiability of parameters in latent structure models with many observed variables. Annals of Statistics, 37(6A):3099–3132, 2009.
- Sanjeev Arora, Ravi Kannan, et al. Learning mixtures of separated nonspherical Gaussians. The Annals of Applied Probability, 15(1A):69–92, 2005.
- Sivaraman Balakrishnan, Martin J Wainwright, and Bin Yu. Statistical guarantees for the EM algorithm: From population to sample-based analysis. Annals of Statistics, 2015.
- Mikhail Belkin and Kaushik Sinha. Polynomial learning of distribution families. In 51st Annual IEEE Symposium on Foundations of Computer Science, pages 103–112. IEEE, 2010.
- Kamalika Chaudhuri and Satish Rao. Learning mixtures of product distributions using correlations and independence. In 21st Annual Conference on Learning Theory, volume 4, pages 9–1, 2008.
- Jiahua Chen. Optimal rate of convergence for finite mixture models. Annals of Statistics, 23(1):221–233, 1995.
- Sanjoy Dasgupta and Leonard Schulman. A probabilistic analysis of EM for mixtures of separated, spherical Gaussians. Journal of Machine Learning Research, 8:203–226, 2007.
- Arthur P Dempster, Nan M Laird, and Donald B Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39(1):1–38, 1977.
- David L Donoho and Richard C Liu. The “automatic” robustness of minimum distance functionals. Annals of Statistics, 16(2):552–586, 1988.
- Rong Ge, Furong Huang, Chi Jin, and Yang Yuan. Escaping from saddle points—online stochastic gradient for tensor decomposition. In 28th Annual Conference on Learning Theory, pages 797–842, 2015.
- Christopher R Genovese and Larry Wasserman. Rates of convergence for the Gaussian mixture sieve. Annals of Statistics, 28(4):1105–1127, 2000.
- Subhashis Ghosal and Aad W Van Der Vaart. Entropies and rates of convergence for maximum likelihood and Bayes estimation for mixtures of normal densities. Annals of Statistics, 29(5):1233–1263, 2001.
- Nhat Ho and XuanLong Nguyen. Identifiability and optimal rates of convergence for parameters of multiple types in finite mixtures. arXiv preprint arXiv:1501.02497, 2015.
- Daniel Hsu and Sham M Kakade. Learning mixtures of spherical Gaussians: Moment methods and spectral decompositions. In Proceedings of the 4th Conference on Innovations in Theoretical Computer Science, pages 11–20. ACM, 2013.
- Jason D Lee, Max Simchowitz, Michael I Jordan, and Benjamin Recht. Gradient descent converges to minimizers. In 29th Annual Conference on Learning Theory, pages 1246–1257, 2016.
- Po-Ling Loh and Martin J Wainwright. Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima. In Advances in Neural Information Processing Systems, pages 476–484, 2013.
- Ankur Moitra and Gregory Valiant. Settling the polynomial learnability of mixtures of Gaussians. In 51st Annual IEEE Symposium on Foundations of Computer Science, pages 93–102. IEEE, 2010.
- Ioannis Panageas and Georgios Piliouras. Gradient descent converges to minimizers: The case of non-isolated critical points. arXiv preprint arXiv:1605.00405, 2016.
- Razvan Pascanu, Yann N Dauphin, Surya Ganguli, and Yoshua Bengio. On the saddle point problem for non-convex optimization. arXiv preprint arXiv:1405.4604, 2014.
- Nathan Srebro. Are there local maxima in the infinite-sample likelihood of Gaussian mixture estimation? In 20th Annual Conference on Learning Theory, pages 628–629, 2007.
- Henry Teicher. Identifiability of finite mixtures. The Annals of Mathematical Statistics, 34(4):1265–1269, 1963.
- D Michael Titterington. Statistical Analysis of Finite Mixture Distributions. Wiley, 1985.
- Santosh Vempala and Grant Wang. A spectral algorithm for learning mixtures of distributions. In The 43rd Annual IEEE Symposium on Foundations of Computer Science, pages 113–122. IEEE, 2002.
- Zhaoran Wang, Han Liu, and Tong Zhang. Optimal computational and statistical rates of convergence for sparse nonconvex learning problems. Annals of Statistics, 42(6):2164, 2014.

标签

评论

数据免责声明

页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果，我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问，可以通过电子邮件方式联系我们：report@aminer.cn