AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
The estimators obtained via Theorem 3.1 and Theorem 3.5 use only up to third-order moments, which suggests that each document only needs to have three words

Tensor decompositions for learning latent variable models

Journal of Machine Learning Research, no. 1 (2014): 2773-2832

引用1063|浏览121
EI
下载 PDF 全文
引用
微博一下

摘要

This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models--including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation--which exploits a certain tensor structure in their low-order observable moments (typically, of second- and third...更多

代码

数据

0
简介
  • The method of moments is a classical parameter estimation technique (Pearson, 1894) from statistics which has proved invaluable in a number of application domains.
  • The primary difficulty in learning latent variable models is that the latent state of the data is not directly observed; rather only observed variables correlated with the hidden state are observed
  • As such, it is not evident the method of moments should fare any better than maximum likelihood in terms of computational performance: matching the model parameters to the observed moments may involve solving computationally intractable systems of multivariate polynomial equations.
  • What is more is that these decomposition problems are often amenable to simple and efficient iterative methods, such as gradient descent and the power iteration method
重点内容
  • The method of moments is a classical parameter estimation technique (Pearson, 1894) from statistics which has proved invaluable in a number of application domains
  • In a number of cases, the method of moments leads to consistent estimators which can be efficiently computed; this is especially relevant in the context of latent variable models, where standard maximum likelihood approaches are typically computationally prohibitive, and heuristic methods can be unreliable and difficult to validate with high-dimensional data
  • The method of moments can be viewed as complementary to the maximum likelihood approach; taking a single step of Newton-Raphson on the likelihood function starting from the moment based estimator (Le Cam, 1986) often leads to the best of both worlds: a computationally efficient estimator that is statistically optimal
  • We discuss some practical and application-oriented issues related to the tensor decomposition approach to learning latent variable models
  • A number of practical concerns arise when dealing with moment matrices and tensors
  • The estimators obtained via Theorem 3.1 and Theorem 3.5 (LDA) use only up to third-order moments, which suggests that each document only needs to have three words
结论
  • The authors discuss some practical and application-oriented issues related to the tensor decomposition approach to learning latent variable models.

    6.1 Practical Implementation Considerations

    A number of practical concerns arise when dealing with moment matrices and tensors.
  • X in a document are conditionally i.i.d. given the topic h
  • This allows one to estimate p-th order moments using just p words per document.
  • One should use all of the words in a document for efficient estimation of the moments.
  • Ordered triples of words in a document of length
  • At first blush, this seems computationally expensive, but as it turns out, the averaging can be done implicitly, as shown by Zou et al (2013)
相关工作
  • The connection between tensor decompositions and latent variable models has a long history across many scientific and mathematical disciplines. We review some of the key works that are most closely related to ours.

    1.2.1 Tensor Decompositions

    The role of tensor decompositions in the context of latent variable models dates back to early uses in psychometrics (Cattell, 1944). These ideas later gained popularity in chemometrics, and more recently in numerous science and engineering disciplines, including neuroscience, phylogenetics, signal processing, data mining, and computer vision. A thorough survey of these techniques and applications is given by Kolda and Bader (2009). Below, we discuss a few specific connections to two applications in machine learning and statistics, independent component analysis and latent variable models (between which there is also significant overlap).

    Tensor decompositions have been used in signal processing and computational neuroscience for blind source separation and independent component analysis (ICA) (Comon and Jutten, 2010). Here, statistically independent non-Gaussian sources are linearly mixed in the observed signal, and the goal is to recover the mixing matrix (and ultimately, the original source signals). A typical solution is to locate projections of the observed signals that correspond to local extrema of the so-called “contrast functions” which distinguish Gaussian variables from non-Gaussian variables. This method can be effectively implemented using fast descent algorithms (Hyvarinen, 1999). When using the excess kurtosis (i.e., fourth-order cumulant) as the contrast function, this method reduces to a generalization of the power method for symmetric tensors (Lathauwer et al, 2000; Zhang and Golub, 2001; Kofidis and Regalia, 2002). This case is particularly important, since all local extrema of the kurtosis objective correspond to the true sources (under the assumed statistical model) (Delfosse and Loubaton, 1995); the descent methods can therefore be rigorously analyzed, and their computational and statistical complexity can be bounded (Frieze et al, 1996; Nguyen and Regev, 2009; Arora et al, 2012b).
基金
  • AA is supported in part by the NSF Award CCF-1219234, AFOSR Award FA9550-10-1-0310 and the ARO Award W911NF-12-1-0404
引用论文
  • D. Achlioptas and F. McSherry. On spectral learning of mixtures of distributions. In Eighteenth Annual Conference on Learning Theory, pages 458–469, 2005.
    Google ScholarLocate open access versionFindings
  • E. S. Allman, C. Matias, and J. A. Rhodes. Identifiability of parameters in latent structure models with many observed variables. The Annals of Statistics, 37(6A):3099–3132, 2009.
    Google ScholarLocate open access versionFindings
  • A. Anandkumar, D. P. Foster, D. Hsu, S. M. Kakade, and Y.-K. Liu. A spectral algorithm for latent Dirichlet allocation. In Advances in Neural Information Processing Systems 25, 2012a.
    Google ScholarLocate open access versionFindings
  • A. Anandkumar, D. Hsu, F. Huang, and S. M. Kakade. Learning mixtures of tree graphical models. In Advances in Neural Information Processing Systems 25, 2012b.
    Google ScholarLocate open access versionFindings
  • A. Anandkumar, D. Hsu, and S. M. Kakade. A method of moments for mixture models and hidden Markov models. In Twenty-Fifth Annual Conference on Learning Theory, volume 23, pages 33.1–33.34, 2012c.
    Google ScholarLocate open access versionFindings
  • J. Anderson, M. Belkin, N. Goyal, L. Rademacher, and J. Voss. The more, the merrier: the blessing of dimensionality for learning large Gaussian mixtures. In Twenty-Seventh Annual Conference on Learning Theory, 2014.
    Google ScholarLocate open access versionFindings
  • S. Arora and R. Kannan. Learning mixtures of separated nonspherical Gaussians. The Annals of Applied Probability, 15(1A):69–92, 2005.
    Google ScholarLocate open access versionFindings
  • S. Arora, R. Ge, and A. Moitra. Learning topic models — going beyond SVD. In Fifty-Third IEEE Annual Symposium on Foundations of Computer Science, pages 1–10, 2012a.
    Google ScholarLocate open access versionFindings
  • S. Arora, R. Ge, A. Moitra, and S. Sachdeva. Provable ICA with unknown Gaussian noise, and implications for Gaussian mixtures and autoencoders. In Advances in Neural Information Processing Systems 25, 2012b.
    Google ScholarLocate open access versionFindings
  • T. Austin. On exchangeable random variables and the statistics of large graphs and hypergraphs. Probab. Survey, 5:80–145, 2008.
    Google ScholarLocate open access versionFindings
  • R. Bailly. Quadratic weighted automata: Spectral algorithm and likelihood maximization. Journal of Machine Learning Research, 2011.
    Google ScholarLocate open access versionFindings
  • B. Balle and M. Mohri. Spectral learning of general weighted automata via constrained matrix completion. In Advances in Neural Information Processing Systems 25, 2012.
    Google ScholarLocate open access versionFindings
  • B. Balle, A. Quattoni, and X. Carreras. Local loss optimization in operator models: A new insight into spectral learning. In Twenty-Ninth International Conference on Machine Learning, 2012.
    Google ScholarLocate open access versionFindings
  • M. Belkin and K. Sinha. Polynomial learning of distribution families. In Fifty-First Annual IEEE Symposium on Foundations of Computer Science, pages 103–112, 2010.
    Google ScholarLocate open access versionFindings
  • A. Bhaskara, M. Charikar, A. Moitra, and A. Vijayaraghavan. Smoothed analysis of tensor decompositions. In Proceedings of the 46th Annual ACM Symposium on Theory of Computing, 2014.
    Google ScholarLocate open access versionFindings
  • B. Boots, S. M. Siddiqi, and G. J. Gordon. Closing the learning-planning loop with predictive state representations. In Proceedings of the Robotics Science and Systems Conference, 2010.
    Google ScholarLocate open access versionFindings
  • S. C. Brubaker and S. Vempala. Isotropic PCA and affine-invariant clustering. In FortyNinth Annual IEEE Symposium on Foundations of Computer Science, 2008.
    Google ScholarLocate open access versionFindings
  • A. Bunse-Gerstner, R. Byers, and V. Mehrmann. Numerical methods for simultaneous diagonalization. SIAM Journal on Matrix Analysis and Applications, 14(4):927–949, 1993.
    Google ScholarLocate open access versionFindings
  • J.-F. Cardoso. Super-symmetric decomposition of the fourth-order cumulant tensor. blind identification of more sources than sensors. In Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference on, pages 3109–3112. IEEE, 1991.
    Google ScholarLocate open access versionFindings
  • J.-F. Cardoso. Perturbation of joint diagonalizers. Technical Report 94D027, Signal Department, Telecom Paris, 1994.
    Google ScholarFindings
  • J.-F. Cardoso and P. Comon. Independent component analysis, a survey of some algebraic methods. In IEEE International Symposium on Circuits and Systems, pages 93–96, 1996.
    Google ScholarLocate open access versionFindings
  • J.-F. Cardoso and A. Souloumiac. Blind beamforming for non Gaussian signals. IEE Proceedings-F, 140(6):362–370, 1993.
    Google ScholarLocate open access versionFindings
  • D. Cartwright and B. Sturmfels. The number of eigenvalues of a tensor. Linear Algebra Appl., 438(2):942–952, 2013.
    Google ScholarLocate open access versionFindings
  • R. B. Cattell. Parallel proportional profiles and other principles for determining the choice of factors by rotation. Psychometrika, 9(4):267–283, 1944.
    Google ScholarLocate open access versionFindings
  • J. T. Chang. Full reconstruction of Markov models on evolutionary trees: Identifiability and consistency. Mathematical Biosciences, 137:51–73, 1996.
    Google ScholarLocate open access versionFindings
  • K. Chaudhuri and S. Rao. Learning mixtures of product distributions using correlations and independence. In Twenty-First Annual Conference on Learning Theory, pages 9–20, 2008.
    Google ScholarLocate open access versionFindings
  • S. B. Cohen, K. Stratos, M. Collins, D. P. Foster, and L. Ungar. Spectral learning of latent-variable PCFGs. In Fiftieth Annual Meeting of the Association for Computational Linguistics, 2012.
    Google ScholarLocate open access versionFindings
  • P. Comon. Independent component analysis, a new concept? Signal Processing, 36(3): 287–314, 1994.
    Google ScholarLocate open access versionFindings
  • P. Comon and C. Jutten. Handbook of Blind Source Separation: Independent Component Analysis and Applications. Academic Press. Elsevier, 2010.
    Google ScholarFindings
  • P. Comon, G. Golub, L.-H. Lim, and B. Mourrain. Symmetric tensors and symmetric tensor rank. SIAM Journal on Matrix Analysis Appl., 30(3):1254–1279, 2008.
    Google ScholarLocate open access versionFindings
  • R. M. Corless, P. M. Gianni, and B. M. Trager. A reordered Schur factorization method for zero-dimensional polynomial systems with multiple roots. In Proceedings of the 1997 International Symposium on Symbolic and Algebraic Computation, pages 133–140. ACM, 1997.
    Google ScholarLocate open access versionFindings
  • S. Dasgupta. Learning mixtures of Gaussians. In Fortieth Annual IEEE Symposium on Foundations of Computer Science, pages 634–644, 1999.
    Google ScholarLocate open access versionFindings
  • S. Dasgupta and L. Schulman. A probabilistic analysis of EM for mixtures of separated, spherical Gaussians. Journal of Machine Learning Research, 8(Feb):203–226, 2007.
    Google ScholarLocate open access versionFindings
  • L. De Lathauwer, J. Castaing, and J.-F. Cardoso. Fourth-order cumulant-based blind identification of underdetermined mixtures. Signal Processing, IEEE Transactions on, 55(6):2965–2973, 2007.
    Google ScholarLocate open access versionFindings
  • N. Delfosse and P. Loubaton. Adaptive blind separation of independent sources: a deflation approach. Signal processing, 45(1):59–83, 1995.
    Google ScholarLocate open access versionFindings
  • A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum-likelihood from incomplete data via the EM algorithm. J. Royal Statist. Soc. Ser. B, 39:1–38, 1977.
    Google ScholarLocate open access versionFindings
  • P. Dhillon, J. Rodu, M. Collins, D. P. Foster, and L. Ungar. Spectral dependency parsing with latent variables. In Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 2012.
    Google ScholarLocate open access versionFindings
  • M. Drton, B. Sturmfels, and S. Sullivant. Algebraic factor analysis: tetrads, pentads and beyond. Probability Theory and Related Fields, 138(3):463–493, 2007.
    Google ScholarLocate open access versionFindings
  • A. T. Erdogan. On the convergence of ICA algorithms with symmetric orthogonalization. IEEE Transactions on Signal Processing, 57:2209–2221, 2009.
    Google ScholarLocate open access versionFindings
  • A. M. Frieze, M. Jerrum, and R. Kannan. Learning linear transformations. In ThirtySeventh Annual Symposium on Foundations of Computer Science, pages 359–368, 1996.
    Google ScholarLocate open access versionFindings
  • G. H. Golub and C. F. van Loan. Matrix Computations. Johns Hopkins University Press, 1996.
    Google ScholarFindings
  • N. Halko, P.-G. Martinsson, and J. A. Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Review, 53(2), 2011.
    Google ScholarLocate open access versionFindings
  • R. Harshman. Foundations of the PARAFAC procedure: model and conditions for an ‘explanatory’ multi-mode factor analysis. Technical report, UCLA Working Papers in Phonetics, 1970.
    Google ScholarFindings
  • C. J. Hillar and L.-H. Lim. Most tensor problems are NP-hard. J. ACM, 60(6):45:1–45:39, November 2013. ISSN 0004-5411. doi: 10.1145/2512329.
    Locate open access versionFindings
  • F. L. Hitchcock. The expression of a tensor or a polyadic as a sum of products. Journal of Mathematics and Physics, 6:164–189, 1927a.
    Google ScholarLocate open access versionFindings
  • F. L. Hitchcock. Multiple invariants and generalized rank of a p-way matrix or tensor. Journal of Mathematics and Physics, 7:39–79, 1927b.
    Google ScholarLocate open access versionFindings
  • D. Hsu and S. M. Kakade. Learning mixtures of spherical Gaussians: moment methods and spectral decompositions. In Fourth Innovations in Theoretical Computer Science, 2013.
    Google ScholarLocate open access versionFindings
  • D. Hsu, S. M. Kakade, and P. Liang. Identifiability and unmixing of latent parse trees. In Advances in Neural Information Processing Systems 25, 2012a.
    Google ScholarLocate open access versionFindings
  • D. Hsu, S. M. Kakade, and T. Zhang. A spectral algorithm for learning hidden Markov models. Journal of Computer and System Sciences, 78(5):1460–1480, 2012b.
    Google ScholarLocate open access versionFindings
  • A. Hyvarinen. Fast and robust fixed-point algorithms for independent component analysis. Neural Networks, IEEE Transactions on, 10(3):626–634, 1999.
    Google ScholarLocate open access versionFindings
  • A. Hyvarinen and E. Oja. Independent component analysis: algorithms and applications. Neural Networks, 13(4–5):411–430, 2000.
    Google ScholarLocate open access versionFindings
  • H. Jaeger. Observable operator models for discrete stochastic time series. Neural Comput., 12(6), 2000.
    Google ScholarLocate open access versionFindings
  • A. T. Kalai, A. Moitra, and G. Valiant. Efficiently learning mixtures of two Gaussians. In Forty-second ACM Symposium on Theory of Computing, pages 553–562, 2010.
    Google ScholarLocate open access versionFindings
  • R. Kannan, H. Salmasian, and S. Vempala. The spectral method for general mixture models. SIAM Journal on Computing, 38(3):1141–1156, 2008.
    Google ScholarLocate open access versionFindings
  • E. Kofidis and P. A. Regalia. On the best rank-1 approximation of higher-order supersymmetric tensors. SIAM Journal on Matrix Analysis and Applications, 23(3):863–884, 2002.
    Google ScholarLocate open access versionFindings
  • T. G. Kolda and B. W. Bader. Tensor decompositions and applications. SIAM review, 51 (3):455, 2009.
    Google ScholarFindings
  • T. G. Kolda and J. R. Mayo. Shifted power method for computing tensor eigenpairs. SIAM Journal on Matrix Analysis and Applications, 32(4):1095–1124, October 2011.
    Google ScholarLocate open access versionFindings
  • J. B. Kruskal. Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics. Linear Algebra and Appl., 18(2): 95–138, 1977.
    Google ScholarLocate open access versionFindings
  • L. D. Lathauwer, B. D. Moor, and J. Vandewalle. On the best rank-1 and rank(R1, R2,..., Rn) approximation and applications of higher-order tensors. SIAM J. Matrix Anal. Appl., 21(4):1324–1342, 2000.
    Google ScholarLocate open access versionFindings
  • L. Le Cam. Asymptotic Methods in Statistical Decision Theory. Springer, 1986.
    Google ScholarFindings
  • L.-H. Lim. Singular values and eigenvalues of tensors: a variational approach. Proceedings of the IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing, 1:129–132, 2005.
    Google ScholarLocate open access versionFindings
  • M. Littman, R. Sutton, and S. Singh. Predictive representations of state. In Advances in Neural Information Processing Systems 14, pages 1555–1561, 2001.
    Google ScholarLocate open access versionFindings
  • F. M. Luque, A. Quattoni, B. Balle, and X. Carreras. Spectral learning for non-deterministic dependency parsing. In Conference of the European Chapter of the Association for Computational Linguistics, 2012.
    Google ScholarLocate open access versionFindings
  • J. B. MacQueen. Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, volume 1, pages 281–297. University of California Press, 1967.
    Google ScholarLocate open access versionFindings
  • P. McCullagh. Tensor Methods in Statistics. Chapman and Hall, 1987.
    Google ScholarFindings
  • A. Moitra and G. Valiant. Settling the polynomial learnability of mixtures of Gaussians. In Fifty-First Annual IEEE Symposium on Foundations of Computer Science, pages 93–102, 2010.
    Google ScholarLocate open access versionFindings
  • E. Mossel and S. Roch. Learning nonsingular phylogenies and hidden Markov models. Annals of Applied Probability, 16(2):583–614, 2006.
    Google ScholarLocate open access versionFindings
  • P. Q. Nguyen and O. Regev. Learning a parallelepiped: Cryptanalysis of GGH and NTRU signatures. Journal of Cryptology, 22(2):139–160, 2009.
    Google ScholarLocate open access versionFindings
  • J. Nocedal and S. J. Wright. Numerical Optimization. Springer, 1999.
    Google ScholarLocate open access versionFindings
  • P. V. Overschee and B. D. Moor. Subspace Identification of Linear Systems. Kluwer Academic Publishers, 1996.
    Google ScholarFindings
  • L. Pachter and B. Sturmfels. Algebraic Statistics for Computational Biology, volume 13. Cambridge University Press, 2005.
    Google ScholarLocate open access versionFindings
  • A. Parikh, L. Song, and E. P. Xing. A spectral algorithm for latent tree graphical models. In Twenty-Eighth International Conference on Machine Learning, 2011.
    Google ScholarLocate open access versionFindings
  • K. Pearson. Contributions to the mathematical theory of evolution. Philosophical Transactions of the Royal Society, London, A., page 71, 1894.
    Google ScholarLocate open access versionFindings
  • L. Qi. Eigenvalues of a real supersymmetric tensor. Journal of Symbolic Computation, 40 (6):1302–1324, 2005.
    Google ScholarLocate open access versionFindings
  • R. A. Redner and H. F. Walker. Mixture densities, maximum likelihood and the EM algorithm. SIAM Review, 26(2):195–239, 1984.
    Google ScholarLocate open access versionFindings
  • P. A. Regalia and E. Kofidis. Monotonic convergence of fixed-point algorithms for ICA. IEEE Transactions on Neural Networks, 14:943–949, 2003.
    Google ScholarLocate open access versionFindings
  • S. Roch. A short proof that phylogenetic tree reconstruction by maximum likelihood is hard. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 3(1), 2006.
    Google ScholarLocate open access versionFindings
  • J. Rodu, D. P. Foster, W. Wu, and L. H. Ungar. Using regression for spectral estimation of HMMs. In Statistical Language and Speech Processing, pages 212–223, 2013.
    Google ScholarLocate open access versionFindings
  • M. P. Schutzenberger. On the definition of a family of automata. Inf. Control, 4:245–270, 1961.
    Google ScholarLocate open access versionFindings
  • S. M. Siddiqi, B. Boots, and G. J. Gordon. Reduced-rank hidden Markov models. In Thirteenth International Conference on Artificial Intelligence and Statistics, 2010.
    Google ScholarLocate open access versionFindings
  • D. A. Spielman and S. H. Teng. Smoothed analysis: An attempt to explain the behavior of algorithms in practice. Communications of the ACM, pages 76–84, 2009.
    Google ScholarLocate open access versionFindings
  • A. Stegeman and P. Comon. Subtracting a best rank-1 approximation may increase tensor rank. Linear Algebra and Its Applications, 433:1276–1300, 2010.
    Google ScholarLocate open access versionFindings
  • B. Sturmfels and P. Zwiernik. Binary cumulant varieties. Ann. Comb., (17):229–250, 2013.
    Google ScholarLocate open access versionFindings
  • S. Vempala and G. Wang. A spectral algorithm for learning mixtures models. Journal of Computer and System Sciences, 68(4):841–860, 2004.
    Google ScholarLocate open access versionFindings
  • P. Wedin. Perturbation bounds in connection with singular value decomposition. BIT Numerical Mathematics, 12(1):99–111, 1972.
    Google ScholarLocate open access versionFindings
  • T. Zhang and G. Golub. Rank-one approximation to high order tensors. SIAM Journal on Matrix Analysis and Applications, 23:534–550, 2001.
    Google ScholarLocate open access versionFindings
  • A. Ziehe, P. Laskov, G. Nolte, and K. R. Muller. A fast algorithm for joint diagonalization with non-orthogonal transformations and its application to blind source separation. Journal of Machine Learning Research, 5:777–800, 2004.
    Google ScholarLocate open access versionFindings
  • J. Zou, D. Hsu, D. Parkes, and R. P. Adams. Contrastive learning using spectral methods. In Advances in Neural Information Processing Systems 26, 2013.
    Google ScholarLocate open access versionFindings
0
您的评分 :

暂无评分

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn