AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
We further provided some evidence that even in this favorable setting random initialization schemes for the population EM algorithm are likely to fail with high probability

Local Maxima in the Likelihood of Gaussian Mixture Models: Structural Results and Algorithmic Consequences.

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), (2016): 4116-4124

引用124|浏览75
EI
下载 PDF 全文
引用
微博一下

摘要

We provide two fundamental results on the population (infinite-sample) likelihood function of Gaussian mixture models with M >= 3 components. Our first main result shows that the population likelihood function has bad local maxima even in the special case of equally-weighted mixtures of well-separated and spherical Gaussians. We prove tha...更多

代码

数据

0
简介
  • Finite mixture models are widely used in variety of statistical settings, as models for heterogeneous populations, as flexible models for multivariate density estimation and as models for clustering.
  • Their ability to model data as arising from underlying subpopulations provides essential flexibility in a wide range of applications Titterington [1985]
  • This combinatorial structure creates challenges for statistical and computational theory, and there are many problems associated with estimation of finite mixtures that are still open.
  • The authors focus on the idealized situation in which every mixture component is weighted, and the covariance of each mixture component is the identity
  • This leads to a mixture model of the form p(x | μ∗) := 1 M φ(x | μ∗i , I),
重点内容
  • Finite mixture models are widely used in variety of statistical settings, as models for heterogeneous populations, as flexible models for multivariate density estimation and as models for clustering
  • Their ability to model data as arising from underlying subpopulations provides essential flexibility in a wide range of applications Titterington [1985]. This combinatorial structure creates challenges for statistical and computational theory, and there are many problems associated with estimation of finite mixtures that are still open. These problems are often studied in the setting of Gaussian mixture models (GMMs), reflecting the wide use of Gaussian mixture models in applications, particular in the multivariate setting, and this setting will be our focus in the current paper
  • We further prove that under the same random initialization scheme, the first-order EM algorithm with a suitable stepsize does not converge to a strict saddle point with probability one
  • In Section 2, we introduce Gaussian mixture models, the EM algorithm, its first-order variant and we formally set up the problem we consider
  • We further provided some evidence that even in this favorable setting random initialization schemes for the population EM algorithm are likely to fail with high probability
  • We believe that at least three mixture components are necessary for the log-likelihood to be poorly behaved, and that for a well-separated mixture of two Gaussians the EM algorithm with a random initialization is successful with high probability
结论
  • Conclusion and open problems

    In this paper, the authors resolved an open problem of Srebro [2007], by demonstrating the existence of arbitrarily bad local maxima for the population log-likelihood of Gaussian mixture model, even in the idealized situation where each component is uniformly weighted, spherical with unit variance, and well-separated.
  • The authors further provided some evidence that even in this favorable setting random initialization schemes for the population EM algorithm are likely to fail with high probability.
  • The authors believe that at least three mixture components are necessary for the log-likelihood to be poorly behaved, and that for a well-separated mixture of two Gaussians the EM algorithm with a random initialization is successful with high probability.
基金
  • This work was partially supported by Office of Naval Research MURI grant DOD-002888, Air Force Office of Scientific Research Grant AFOSR-FA9550-14-1-001, the Mathematical Data Science program of the Office of Naval Research under grant number N00014-15-1-2670, and National Science Foundation Grant CIF-31712-23800
引用论文
  • Elizabeth S Allman, Catherine Matias, and John A Rhodes. Identifiability of parameters in latent structure models with many observed variables. Annals of Statistics, 37(6A):3099–3132, 2009.
    Google ScholarLocate open access versionFindings
  • Sanjeev Arora, Ravi Kannan, et al. Learning mixtures of separated nonspherical Gaussians. The Annals of Applied Probability, 15(1A):69–92, 2005.
    Google ScholarLocate open access versionFindings
  • Sivaraman Balakrishnan, Martin J Wainwright, and Bin Yu. Statistical guarantees for the EM algorithm: From population to sample-based analysis. Annals of Statistics, 2015.
    Google ScholarLocate open access versionFindings
  • Mikhail Belkin and Kaushik Sinha. Polynomial learning of distribution families. In 51st Annual IEEE Symposium on Foundations of Computer Science, pages 103–112. IEEE, 2010.
    Google ScholarLocate open access versionFindings
  • Kamalika Chaudhuri and Satish Rao. Learning mixtures of product distributions using correlations and independence. In 21st Annual Conference on Learning Theory, volume 4, pages 9–1, 2008.
    Google ScholarLocate open access versionFindings
  • Jiahua Chen. Optimal rate of convergence for finite mixture models. Annals of Statistics, 23(1):221–233, 1995.
    Google ScholarLocate open access versionFindings
  • Sanjoy Dasgupta and Leonard Schulman. A probabilistic analysis of EM for mixtures of separated, spherical Gaussians. Journal of Machine Learning Research, 8:203–226, 2007.
    Google ScholarLocate open access versionFindings
  • Arthur P Dempster, Nan M Laird, and Donald B Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39(1):1–38, 1977.
    Google ScholarLocate open access versionFindings
  • David L Donoho and Richard C Liu. The “automatic” robustness of minimum distance functionals. Annals of Statistics, 16(2):552–586, 1988.
    Google ScholarLocate open access versionFindings
  • Rong Ge, Furong Huang, Chi Jin, and Yang Yuan. Escaping from saddle points—online stochastic gradient for tensor decomposition. In 28th Annual Conference on Learning Theory, pages 797–842, 2015.
    Google ScholarLocate open access versionFindings
  • Christopher R Genovese and Larry Wasserman. Rates of convergence for the Gaussian mixture sieve. Annals of Statistics, 28(4):1105–1127, 2000.
    Google ScholarLocate open access versionFindings
  • Subhashis Ghosal and Aad W Van Der Vaart. Entropies and rates of convergence for maximum likelihood and Bayes estimation for mixtures of normal densities. Annals of Statistics, 29(5):1233–1263, 2001.
    Google ScholarLocate open access versionFindings
  • Nhat Ho and XuanLong Nguyen. Identifiability and optimal rates of convergence for parameters of multiple types in finite mixtures. arXiv preprint arXiv:1501.02497, 2015.
    Findings
  • Daniel Hsu and Sham M Kakade. Learning mixtures of spherical Gaussians: Moment methods and spectral decompositions. In Proceedings of the 4th Conference on Innovations in Theoretical Computer Science, pages 11–20. ACM, 2013.
    Google ScholarLocate open access versionFindings
  • Jason D Lee, Max Simchowitz, Michael I Jordan, and Benjamin Recht. Gradient descent converges to minimizers. In 29th Annual Conference on Learning Theory, pages 1246–1257, 2016.
    Google ScholarLocate open access versionFindings
  • Po-Ling Loh and Martin J Wainwright. Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima. In Advances in Neural Information Processing Systems, pages 476–484, 2013.
    Google ScholarLocate open access versionFindings
  • Ankur Moitra and Gregory Valiant. Settling the polynomial learnability of mixtures of Gaussians. In 51st Annual IEEE Symposium on Foundations of Computer Science, pages 93–102. IEEE, 2010.
    Google ScholarLocate open access versionFindings
  • Ioannis Panageas and Georgios Piliouras. Gradient descent converges to minimizers: The case of non-isolated critical points. arXiv preprint arXiv:1605.00405, 2016.
    Findings
  • Razvan Pascanu, Yann N Dauphin, Surya Ganguli, and Yoshua Bengio. On the saddle point problem for non-convex optimization. arXiv preprint arXiv:1405.4604, 2014.
    Findings
  • Nathan Srebro. Are there local maxima in the infinite-sample likelihood of Gaussian mixture estimation? In 20th Annual Conference on Learning Theory, pages 628–629, 2007.
    Google ScholarLocate open access versionFindings
  • Henry Teicher. Identifiability of finite mixtures. The Annals of Mathematical Statistics, 34(4):1265–1269, 1963.
    Google ScholarLocate open access versionFindings
  • D Michael Titterington. Statistical Analysis of Finite Mixture Distributions. Wiley, 1985.
    Google ScholarFindings
  • Santosh Vempala and Grant Wang. A spectral algorithm for learning mixtures of distributions. In The 43rd Annual IEEE Symposium on Foundations of Computer Science, pages 113–122. IEEE, 2002.
    Google ScholarLocate open access versionFindings
  • Zhaoran Wang, Han Liu, and Tong Zhang. Optimal computational and statistical rates of convergence for sparse nonconvex learning problems. Annals of Statistics, 42(6):2164, 2014.
    Google ScholarLocate open access versionFindings
0
您的评分 :

暂无评分

标签
评论
avatar
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn