Faster Algorithms for High-Dimensional Robust Covariance Estimation

COLT, pp. 727-757, 2019.

Cited by: 14|Bibtex|Views26|Links
EI
Keywords:
frobenius errorrobust mean estimationwidth independentfast algorithmpositive semidefiniteMore(10+)
Weibo:
We develop a related robust covariance estimation algorithm with additive error guarantee, whose running time does not depend on the condition number of Σ

Abstract:

We study the problem of estimating the covariance matrix of a high-dimensional distribution when a small constant fraction of the samples can be arbitrarily corrupted. Recent work gave the first polynomial time algorithms for this problem with near-optimal error guarantees for several natural structured distributions. Our main contribut...More

Code:

Data:

Introduction
  • Estimating the covariance matrix of a high-dimensional distribution is one of the most fundamental statistical tasks (see, e.g., Bickel and Levina (2008a,b) and references therein).
  • On input an -corrupted set of N = O(d2/ 2) samples from N (0, Σ) on Rd, the algorithm runs in time O(d3.26)/ poly( ), and outputs a covariance estimate with near-optimal error guarantee, matching the one in Diakonikolas et al (2016).
  • The authors' first algorithmic result states that the authors can robustly estimate the covariance matrix of a highdimensional Gaussian within multiplicative, dimension-independent error, with running time that almost matches that of computing the empirical covariance matrix.
Highlights
  • Estimating the covariance matrix of a high-dimensional distribution is one of the most fundamental statistical tasks (see, e.g., Bickel and Levina (2008a,b) and references therein)
  • For a range of well-behaved distribution families, the empirical covariance matrix is known to converge to the true covariance matrix at an optimal statistical rate
  • The recursive dimension-halving estimator of Lai et al (2016) requires Ω SVD computations of a d2 × d2 “covariance” matrix, has runtime Ω(d2ω), where ω is the exponent of matrix multiplication. (Plugging in the best-known value for ω (Gall (2014)) gives a runtime of Ω(d4.74).)
  • We develop a related robust covariance estimation algorithm (Algorithm 3) with additive error guarantee, whose running time does not depend on the condition number of Σ
Results
  • Given as input an -corrupted set of N = Ω(d2/ 2) samples drawn from D, there is an algorithm (Algorithm 1) that runs in time O(d3.26 log(κ))/ poly( ) and outputs Σ ∈ Rd×d such that with high probability it holds Σ−1/2ΣΣ−1/2 − the author Fs ≤ O( log(1/ )).
  • Theorem 3 (Robust Covariance Estimation (Additive)) For the same setting as in Theorem 2, there is an algorithm (Algorithm 3) that runs in time O(d3.26)/ poly( ) and outputs Σ ∈ Rd×d such that with high probability it holds Σ − Σ F ≤ O( log(1/ )) Σ 2.
  • The authors' algorithm needs to solve the robust mean estimation problem for X ⊗ X without computing these vectors explicitly.
  • A natural way of improving the running time of non-robust covariance estimation is to try to approximate the product XT X using oblivious sketching (see, e.g., Woodruff (2014) for a survey) which works roughly as follows.
  • Proof [Proof of Theorem 2] The authors first use Lemma 7 to find an upper bound Σ0 ∈ Rd×d on the true covariance matrix Σ such that Σ Σ0 (κ poly(d))Σ.
  • Because Z has bounded covariance, the authors can use the following robust mean estimation algorithm from Cheng et al (2019).
  • Lemma 10 (Mean Estimation for Bounded-Covariance Distributions Cheng et al (2019)) Let
Conclusion
  • Lemma 11 (Mean Estimation with Approximately Known Covariance) Let D be a distribution supported on R√d with unknown m√ean μ and covariance Σ.
  • Given an -corrupted set of N = Ω(d/δ2) samples drawn from D, Algorithm 2 outputs a hypothesis vector μ such that with high probability, μ − μ 2 ≤ O(δ).
  • Proposition 12 (Mean Estimation with Tensor Input) If all input samples (Zi)Ni=1 have the form Zi = Yi ⊗ Yi for some Yi ∈ Rd, and they are given implicitly as the vectors (Xi)Ni=1, both Algorithms 5 and 2 can be implemented to run in O(d3.26/ 8) time.
Summary
  • Estimating the covariance matrix of a high-dimensional distribution is one of the most fundamental statistical tasks (see, e.g., Bickel and Levina (2008a,b) and references therein).
  • On input an -corrupted set of N = O(d2/ 2) samples from N (0, Σ) on Rd, the algorithm runs in time O(d3.26)/ poly( ), and outputs a covariance estimate with near-optimal error guarantee, matching the one in Diakonikolas et al (2016).
  • The authors' first algorithmic result states that the authors can robustly estimate the covariance matrix of a highdimensional Gaussian within multiplicative, dimension-independent error, with running time that almost matches that of computing the empirical covariance matrix.
  • Given as input an -corrupted set of N = Ω(d2/ 2) samples drawn from D, there is an algorithm (Algorithm 1) that runs in time O(d3.26 log(κ))/ poly( ) and outputs Σ ∈ Rd×d such that with high probability it holds Σ−1/2ΣΣ−1/2 − the author Fs ≤ O( log(1/ )).
  • Theorem 3 (Robust Covariance Estimation (Additive)) For the same setting as in Theorem 2, there is an algorithm (Algorithm 3) that runs in time O(d3.26)/ poly( ) and outputs Σ ∈ Rd×d such that with high probability it holds Σ − Σ F ≤ O( log(1/ )) Σ 2.
  • The authors' algorithm needs to solve the robust mean estimation problem for X ⊗ X without computing these vectors explicitly.
  • A natural way of improving the running time of non-robust covariance estimation is to try to approximate the product XT X using oblivious sketching (see, e.g., Woodruff (2014) for a survey) which works roughly as follows.
  • Proof [Proof of Theorem 2] The authors first use Lemma 7 to find an upper bound Σ0 ∈ Rd×d on the true covariance matrix Σ such that Σ Σ0 (κ poly(d))Σ.
  • Because Z has bounded covariance, the authors can use the following robust mean estimation algorithm from Cheng et al (2019).
  • Lemma 10 (Mean Estimation for Bounded-Covariance Distributions Cheng et al (2019)) Let
  • Lemma 11 (Mean Estimation with Approximately Known Covariance) Let D be a distribution supported on R√d with unknown m√ean μ and covariance Σ.
  • Given an -corrupted set of N = Ω(d/δ2) samples drawn from D, Algorithm 2 outputs a hypothesis vector μ such that with high probability, μ − μ 2 ≤ O(δ).
  • Proposition 12 (Mean Estimation with Tensor Input) If all input samples (Zi)Ni=1 have the form Zi = Yi ⊗ Yi for some Yi ∈ Rd, and they are given implicitly as the vectors (Xi)Ni=1, both Algorithms 5 and 2 can be implemented to run in O(d3.26/ 8) time.
Funding
  • Ilias Diakonikolas was supported by NSF Award CCF-1652862 (CAREER) and a Sloan Research Fellowship
  • Rong Ge is supported by NSF Award CCF-1704656, CCF1845171 (CAREER), a Sloan Research Fellowship, and a Google Faculty Research Award
  • David Woodruff was supported in part by Office of Naval Research (ONR) grant N00014-18-1-2562
Reference
  • Z. Allen-Zhu, Y. Lee, and L. Orecchia. Using optimization to obtain a width-independent, parallel, simpler, and faster positive SDP solver. In Proc. 27th Annual Symposium on Discrete Algorithms (SODA), pages 1824–1831, 2016.
    Google ScholarLocate open access versionFindings
  • S. Arora and S. Kale. A combinatorial, primal-dual approach to semidefinite programs. Journal of the ACM, 63(2):12:1–12:35, 2016.
    Google ScholarLocate open access versionFindings
  • S. Balakrishnan, S. S. Du, J. Li, and A. Singh. Computationally efficient robust sparse estimation in high dimensions. In Proc. 30th Annual Conference on Learning Theory (COLT), pages 169–212, 2017.
    Google ScholarLocate open access versionFindings
  • P. J. Bickel and E. Levina. Covariance regularization by thresholding. Ann. Statist., 36(6):2577– 2604, 12 2008a.
    Google ScholarLocate open access versionFindings
  • P. J. Bickel and E. Levina. Regularized estimation of large covariance matrices. Ann. Statist., 36 (1):199–227, 02 2008b.
    Google ScholarFindings
  • J. L. Bordewijk. Inter-reciprocity applied to electrical networks. Applied Scientific Research, Section A, 6(1):1–74, 1957.
    Google ScholarLocate open access versionFindings
  • M. Braverman and A. Rao. Information equals amortized communication. In Proc. 52nd IEEE Symposium on Foundations of Computer Science (FOCS), pages 748–757, 2011.
    Google ScholarLocate open access versionFindings
  • M. Braverman, A. Garg, D. Pankratov, and O. Weinstein. Information lower bounds via selfreducibility. Theory of Computing Systems, 59(2):377–396, 2016.
    Google ScholarLocate open access versionFindings
  • T. T. Cai, C.-H. Zhang, and H. H. Zhou. Optimal rates of convergence for covariance matrix estimation. Ann. Statist., 38(4):2118–2144, 08 2010.
    Google ScholarLocate open access versionFindings
  • A. Chakrabarti and O. Regev. An optimal lower bound on the communication complexity of gaphamming-distance. SIAM J. on Comput., 41(5):1299–1317, 2012.
    Google ScholarLocate open access versionFindings
  • M. Charikar, J. Steinhardt, and G. Valiant. Learning from untrusted data. In Proc. 49th Annual ACM Symposium on Theory of Computing (STOC), pages 47–60, 2017.
    Google ScholarLocate open access versionFindings
  • M. Chen, C. Gao, and Z. Ren. Robust covariance and scatter matrix estimation under Huber’s contamination model. Ann. Statist., 46(5):1932–1960, 10 2018.
    Google ScholarLocate open access versionFindings
  • Y. Cheng, I. Diakonikolas, D. M. Kane, and A. Stewart. Robust learning of fixed-structure Bayesian networks. In Proc. 33rd Annual Conference on Neural Information Processing Systems (NeurIPS), pages 10304–10316, 2018.
    Google ScholarLocate open access versionFindings
  • Y. Cheng, I. Diakonikolas, and R. Ge. High-dimensional robust mean estimation in nearly-linear time. In Proc. 30th Annual Symposium on Discrete Algorithms (SODA), pages 2755–2771, 2019.
    Google ScholarLocate open access versionFindings
  • J. Demmel, I. Dumitriu, and O. Holtz. Fast linear algebra is stable. Numerische Mathematik, 108 (1):59–91, 2007.
    Google ScholarLocate open access versionFindings
  • I. Diakonikolas, G. Kamath, D. M. Kane, J. Li, A. Moitra, and A. Stewart. Robust estimators in high dimensions without the computational intractability. In Proc. 57th IEEE Symposium on Foundations of Computer Science (FOCS), pages 655–664, 2016.
    Google ScholarLocate open access versionFindings
  • I. Diakonikolas, G. Kamath, D. M. Kane, J. Li, A. Moitra, and A. Stewart. Being robust (in high dimensions) can be practical. In Proc. 34th International Conference on Machine Learning (ICML), pages 999–1008, 2017a.
    Google ScholarLocate open access versionFindings
  • I. Diakonikolas, D. M. Kane, and A. Stewart. Statistical query lower bounds for robust estimation of high-dimensional Gaussians and Gaussian mixtures. In Proc. 58th IEEE Symposium on Foundations of Computer Science (FOCS), pages 73–84, 2017b.
    Google ScholarLocate open access versionFindings
  • I. Diakonikolas, G. Kamath, D. M. Kane, J. Li, A. Moitra, and A. Stewart. Robustly learning a Gaussian: Getting optimal error, efficiently. In Proc. 29th Annual Symposium on Discrete Algorithms (SODA), pages 2683–2702, 2018a.
    Google ScholarLocate open access versionFindings
  • I. Diakonikolas, D. M. Kane, and A. Stewart. List-decodable robust mean estimation and learning mixtures of spherical Gaussians. In Proc. 50th Annual ACM Symposium on Theory of Computing (STOC), pages 1047–1060, 2018b.
    Google ScholarLocate open access versionFindings
  • I. Diakonikolas, D. M. Kane, and A. Stewart. Learning geometric concepts with nasty noise. In Proc. 50th Annual ACM Symposium on Theory of Computing (STOC), pages 1061–1073, 2018c.
    Google ScholarLocate open access versionFindings
  • I. Diakonikolas, G. Kamath, D. M. Kane, J. Li, J. Steinhardt, and A. Stewart. Sever: A robust meta-algorithm for stochastic optimization. In Proc. 36th International Conference on Machine Learning (ICML), 2019a.
    Google ScholarLocate open access versionFindings
  • I. Diakonikolas, W. Kong, and A. Stewart. Efficient algorithms and lower bounds for robust linear regression. In Proc. 30th Annual Symposium on Discrete Algorithms (SODA), pages 2745–2754, 2019b.
    Google ScholarLocate open access versionFindings
  • C. M. Fiduccia. On the algebraic complexity of matrix multiplication. PhD thesis, Brown University, 1973.
    Google ScholarFindings
  • F. L. Gall. Faster algorithms for rectangular matrix multiplication. In Proc. 53rd IEEE Symposium on Foundations of Computer Science (FOCS), pages 514–523, 2012.
    Google ScholarLocate open access versionFindings
  • F. L. Gall. Powers of tensors and fast matrix multiplication. In International Symposium on Symbolic and Algebraic Computation (ISSAC), pages 296–303, 2014.
    Google ScholarLocate open access versionFindings
  • S. B. Hopkins and J. Li. Mixture models, robustness, and sum of squares proofs. In Proc. 50th Annual ACM Symposium on Theory of Computing (STOC), pages 1021–1034, 2018.
    Google ScholarLocate open access versionFindings
  • P. J. Huber. Robust estimation of a location parameter. Ann. Math. Statist., 35(1):73–101, 03 1964.
    Google ScholarLocate open access versionFindings
  • D. M. Kane. Robust covariance estimation. Talk given at TTIC Workshop on Computational Efficiency and High-Dimensional Robust Statistics, 2018. Available at http://www.iliasdiakonikolas.org/tti-robust/Kane-Covariance.pdf.
    Locate open access versionFindings
  • A. Klivans, P. Kothari, and R. Meka. Efficient algorithms for outlier-robust regression. In Proc. 31st Annual Conference on Learning Theory (COLT), pages 1420–1430, 2018.
    Google ScholarLocate open access versionFindings
  • P. K. Kothari, J. Steinhardt, and D. Steurer. Robust moment estimation and improved clustering via sum of squares. In Proc. 50th Annual ACM Symposium on Theory of Computing (STOC), pages 1035–1046, 2018.
    Google ScholarLocate open access versionFindings
  • K. A. Lai, A. B. Rao, and S. Vempala. Agnostic estimation of mean and covariance. In Proc. 57th IEEE Symposium on Foundations of Computer Science (FOCS), pages 665–674, 2016.
    Google ScholarLocate open access versionFindings
  • L. Liu, Y. Shen, T. Li, and C. Caramanis. High dimensional robust sparse regression. arXiv preprint arXiv:1805.11643, 2018.
    Findings
  • G. Lotti and F. Romani. On the asymptotic complexity of rectangular matrix multiplication. Theor. Comp. Sci., 23:171–185, 1983.
    Google ScholarLocate open access versionFindings
  • J. Novembre, T. Johnson, K. Bryc, Z. Kutalik, A. R. Boyko, A. Auton, A. Indap, K. S. King, S. Bergmann, M. R. Nelson, et al. Genes mirror geography within europe. Nature, 456(7218): 98–101, 2008.
    Google ScholarLocate open access versionFindings
  • R. Pagh, M. Stockel, and D. P. Woodruff. Is min-wise hashing optimal for summarizing set intersection? In Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), pages 109–120, 2014.
    Google ScholarLocate open access versionFindings
  • R. Peng, K. Tangwongsan, and P. Zhang. Faster and simpler width-independent parallel algorithms for positive semidefinite programming. arXiv preprint arXiv:1201.5135v3, 2016.
    Findings
  • A. Prasad, A. S. Suggala, S. Balakrishnan, and P. Ravikumar. Robust estimation via robust gradient estimation. arXiv preprint arXiv:1802.06485, 2018.
    Findings
  • P. Rousseeuw. Multivariate estimation with high breakdown point. Mathematical Statistics and Applications, pages 283–297, 1985.
    Google ScholarLocate open access versionFindings
  • J. Steinhardt, M. Charikar, and G. Valiant. Resilience: A criterion for learning in the presence of arbitrary outliers. In Proc. 9th Innovations in Theoretical Computer Science Conference (ITCS), pages 45:1–45:21, 2018.
    Google ScholarLocate open access versionFindings
  • J. W. Tukey. Mathematics and picturing of data. In Proceedings of ICM, volume 6, pages 523–531, 1975.
    Google ScholarLocate open access versionFindings
  • D. P. Woodruff. Sketching as a tool for numerical linear algebra. Foundations and Trends in Theoretical Computer Science, 10(1-2):1–157, 2014.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments