# Faster Algorithms for High-Dimensional Robust Covariance Estimation

COLT, pp. 727-757, 2019.

EI
Keywords:
frobenius errorrobust mean estimationwidth independentfast algorithmpositive semidefiniteMore(10+)
Weibo:
We develop a related robust covariance estimation algorithm with additive error guarantee, whose running time does not depend on the condition number of Σ

Abstract:

We study the problem of estimating the covariance matrix of a high-dimensional distribution when a small constant fraction of the samples can be arbitrarily corrupted. Recent work gave the first polynomial time algorithms for this problem with near-optimal error guarantees for several natural structured distributions. Our main contribut...More

Code:

Data:

Introduction
• Estimating the covariance matrix of a high-dimensional distribution is one of the most fundamental statistical tasks (see, e.g., Bickel and Levina (2008a,b) and references therein).
• On input an -corrupted set of N = O(d2/ 2) samples from N (0, Σ) on Rd, the algorithm runs in time O(d3.26)/ poly( ), and outputs a covariance estimate with near-optimal error guarantee, matching the one in Diakonikolas et al (2016).
• The authors' first algorithmic result states that the authors can robustly estimate the covariance matrix of a highdimensional Gaussian within multiplicative, dimension-independent error, with running time that almost matches that of computing the empirical covariance matrix.
Highlights
• Estimating the covariance matrix of a high-dimensional distribution is one of the most fundamental statistical tasks (see, e.g., Bickel and Levina (2008a,b) and references therein)
• For a range of well-behaved distribution families, the empirical covariance matrix is known to converge to the true covariance matrix at an optimal statistical rate
• The recursive dimension-halving estimator of Lai et al (2016) requires Ω SVD computations of a d2 × d2 “covariance” matrix, has runtime Ω(d2ω), where ω is the exponent of matrix multiplication. (Plugging in the best-known value for ω (Gall (2014)) gives a runtime of Ω(d4.74).)
• We develop a related robust covariance estimation algorithm (Algorithm 3) with additive error guarantee, whose running time does not depend on the condition number of Σ
Results
• Given as input an -corrupted set of N = Ω(d2/ 2) samples drawn from D, there is an algorithm (Algorithm 1) that runs in time O(d3.26 log(κ))/ poly( ) and outputs Σ ∈ Rd×d such that with high probability it holds Σ−1/2ΣΣ−1/2 − the author Fs ≤ O( log(1/ )).
• Theorem 3 (Robust Covariance Estimation (Additive)) For the same setting as in Theorem 2, there is an algorithm (Algorithm 3) that runs in time O(d3.26)/ poly( ) and outputs Σ ∈ Rd×d such that with high probability it holds Σ − Σ F ≤ O( log(1/ )) Σ 2.
• The authors' algorithm needs to solve the robust mean estimation problem for X ⊗ X without computing these vectors explicitly.
• A natural way of improving the running time of non-robust covariance estimation is to try to approximate the product XT X using oblivious sketching (see, e.g., Woodruff (2014) for a survey) which works roughly as follows.
• Proof [Proof of Theorem 2] The authors first use Lemma 7 to find an upper bound Σ0 ∈ Rd×d on the true covariance matrix Σ such that Σ Σ0 (κ poly(d))Σ.
• Because Z has bounded covariance, the authors can use the following robust mean estimation algorithm from Cheng et al (2019).
• Lemma 10 (Mean Estimation for Bounded-Covariance Distributions Cheng et al (2019)) Let
Conclusion
• Lemma 11 (Mean Estimation with Approximately Known Covariance) Let D be a distribution supported on R√d with unknown m√ean μ and covariance Σ.
• Given an -corrupted set of N = Ω(d/δ2) samples drawn from D, Algorithm 2 outputs a hypothesis vector μ such that with high probability, μ − μ 2 ≤ O(δ).
• Proposition 12 (Mean Estimation with Tensor Input) If all input samples (Zi)Ni=1 have the form Zi = Yi ⊗ Yi for some Yi ∈ Rd, and they are given implicitly as the vectors (Xi)Ni=1, both Algorithms 5 and 2 can be implemented to run in O(d3.26/ 8) time.
Summary
• Estimating the covariance matrix of a high-dimensional distribution is one of the most fundamental statistical tasks (see, e.g., Bickel and Levina (2008a,b) and references therein).
• On input an -corrupted set of N = O(d2/ 2) samples from N (0, Σ) on Rd, the algorithm runs in time O(d3.26)/ poly( ), and outputs a covariance estimate with near-optimal error guarantee, matching the one in Diakonikolas et al (2016).
• The authors' first algorithmic result states that the authors can robustly estimate the covariance matrix of a highdimensional Gaussian within multiplicative, dimension-independent error, with running time that almost matches that of computing the empirical covariance matrix.
• Given as input an -corrupted set of N = Ω(d2/ 2) samples drawn from D, there is an algorithm (Algorithm 1) that runs in time O(d3.26 log(κ))/ poly( ) and outputs Σ ∈ Rd×d such that with high probability it holds Σ−1/2ΣΣ−1/2 − the author Fs ≤ O( log(1/ )).
• Theorem 3 (Robust Covariance Estimation (Additive)) For the same setting as in Theorem 2, there is an algorithm (Algorithm 3) that runs in time O(d3.26)/ poly( ) and outputs Σ ∈ Rd×d such that with high probability it holds Σ − Σ F ≤ O( log(1/ )) Σ 2.
• The authors' algorithm needs to solve the robust mean estimation problem for X ⊗ X without computing these vectors explicitly.
• A natural way of improving the running time of non-robust covariance estimation is to try to approximate the product XT X using oblivious sketching (see, e.g., Woodruff (2014) for a survey) which works roughly as follows.
• Proof [Proof of Theorem 2] The authors first use Lemma 7 to find an upper bound Σ0 ∈ Rd×d on the true covariance matrix Σ such that Σ Σ0 (κ poly(d))Σ.
• Because Z has bounded covariance, the authors can use the following robust mean estimation algorithm from Cheng et al (2019).
• Lemma 10 (Mean Estimation for Bounded-Covariance Distributions Cheng et al (2019)) Let
• Lemma 11 (Mean Estimation with Approximately Known Covariance) Let D be a distribution supported on R√d with unknown m√ean μ and covariance Σ.
• Given an -corrupted set of N = Ω(d/δ2) samples drawn from D, Algorithm 2 outputs a hypothesis vector μ such that with high probability, μ − μ 2 ≤ O(δ).
• Proposition 12 (Mean Estimation with Tensor Input) If all input samples (Zi)Ni=1 have the form Zi = Yi ⊗ Yi for some Yi ∈ Rd, and they are given implicitly as the vectors (Xi)Ni=1, both Algorithms 5 and 2 can be implemented to run in O(d3.26/ 8) time.
Funding
• Ilias Diakonikolas was supported by NSF Award CCF-1652862 (CAREER) and a Sloan Research Fellowship
• Rong Ge is supported by NSF Award CCF-1704656, CCF1845171 (CAREER), a Sloan Research Fellowship, and a Google Faculty Research Award
• David Woodruff was supported in part by Office of Naval Research (ONR) grant N00014-18-1-2562
Reference
• Z. Allen-Zhu, Y. Lee, and L. Orecchia. Using optimization to obtain a width-independent, parallel, simpler, and faster positive SDP solver. In Proc. 27th Annual Symposium on Discrete Algorithms (SODA), pages 1824–1831, 2016.
• S. Arora and S. Kale. A combinatorial, primal-dual approach to semidefinite programs. Journal of the ACM, 63(2):12:1–12:35, 2016.
• S. Balakrishnan, S. S. Du, J. Li, and A. Singh. Computationally efficient robust sparse estimation in high dimensions. In Proc. 30th Annual Conference on Learning Theory (COLT), pages 169–212, 2017.
• P. J. Bickel and E. Levina. Covariance regularization by thresholding. Ann. Statist., 36(6):2577– 2604, 12 2008a.
• P. J. Bickel and E. Levina. Regularized estimation of large covariance matrices. Ann. Statist., 36 (1):199–227, 02 2008b.
• J. L. Bordewijk. Inter-reciprocity applied to electrical networks. Applied Scientific Research, Section A, 6(1):1–74, 1957.
• M. Braverman and A. Rao. Information equals amortized communication. In Proc. 52nd IEEE Symposium on Foundations of Computer Science (FOCS), pages 748–757, 2011.
• M. Braverman, A. Garg, D. Pankratov, and O. Weinstein. Information lower bounds via selfreducibility. Theory of Computing Systems, 59(2):377–396, 2016.
• T. T. Cai, C.-H. Zhang, and H. H. Zhou. Optimal rates of convergence for covariance matrix estimation. Ann. Statist., 38(4):2118–2144, 08 2010.
• A. Chakrabarti and O. Regev. An optimal lower bound on the communication complexity of gaphamming-distance. SIAM J. on Comput., 41(5):1299–1317, 2012.
• M. Charikar, J. Steinhardt, and G. Valiant. Learning from untrusted data. In Proc. 49th Annual ACM Symposium on Theory of Computing (STOC), pages 47–60, 2017.
• M. Chen, C. Gao, and Z. Ren. Robust covariance and scatter matrix estimation under Huber’s contamination model. Ann. Statist., 46(5):1932–1960, 10 2018.
• Y. Cheng, I. Diakonikolas, D. M. Kane, and A. Stewart. Robust learning of fixed-structure Bayesian networks. In Proc. 33rd Annual Conference on Neural Information Processing Systems (NeurIPS), pages 10304–10316, 2018.
• Y. Cheng, I. Diakonikolas, and R. Ge. High-dimensional robust mean estimation in nearly-linear time. In Proc. 30th Annual Symposium on Discrete Algorithms (SODA), pages 2755–2771, 2019.
• J. Demmel, I. Dumitriu, and O. Holtz. Fast linear algebra is stable. Numerische Mathematik, 108 (1):59–91, 2007.
• I. Diakonikolas, G. Kamath, D. M. Kane, J. Li, A. Moitra, and A. Stewart. Robust estimators in high dimensions without the computational intractability. In Proc. 57th IEEE Symposium on Foundations of Computer Science (FOCS), pages 655–664, 2016.
• I. Diakonikolas, G. Kamath, D. M. Kane, J. Li, A. Moitra, and A. Stewart. Being robust (in high dimensions) can be practical. In Proc. 34th International Conference on Machine Learning (ICML), pages 999–1008, 2017a.
• I. Diakonikolas, D. M. Kane, and A. Stewart. Statistical query lower bounds for robust estimation of high-dimensional Gaussians and Gaussian mixtures. In Proc. 58th IEEE Symposium on Foundations of Computer Science (FOCS), pages 73–84, 2017b.
• I. Diakonikolas, G. Kamath, D. M. Kane, J. Li, A. Moitra, and A. Stewart. Robustly learning a Gaussian: Getting optimal error, efficiently. In Proc. 29th Annual Symposium on Discrete Algorithms (SODA), pages 2683–2702, 2018a.
• I. Diakonikolas, D. M. Kane, and A. Stewart. List-decodable robust mean estimation and learning mixtures of spherical Gaussians. In Proc. 50th Annual ACM Symposium on Theory of Computing (STOC), pages 1047–1060, 2018b.
• I. Diakonikolas, D. M. Kane, and A. Stewart. Learning geometric concepts with nasty noise. In Proc. 50th Annual ACM Symposium on Theory of Computing (STOC), pages 1061–1073, 2018c.
• I. Diakonikolas, G. Kamath, D. M. Kane, J. Li, J. Steinhardt, and A. Stewart. Sever: A robust meta-algorithm for stochastic optimization. In Proc. 36th International Conference on Machine Learning (ICML), 2019a.
• I. Diakonikolas, W. Kong, and A. Stewart. Efficient algorithms and lower bounds for robust linear regression. In Proc. 30th Annual Symposium on Discrete Algorithms (SODA), pages 2745–2754, 2019b.
• C. M. Fiduccia. On the algebraic complexity of matrix multiplication. PhD thesis, Brown University, 1973.
• F. L. Gall. Faster algorithms for rectangular matrix multiplication. In Proc. 53rd IEEE Symposium on Foundations of Computer Science (FOCS), pages 514–523, 2012.
• F. L. Gall. Powers of tensors and fast matrix multiplication. In International Symposium on Symbolic and Algebraic Computation (ISSAC), pages 296–303, 2014.
• S. B. Hopkins and J. Li. Mixture models, robustness, and sum of squares proofs. In Proc. 50th Annual ACM Symposium on Theory of Computing (STOC), pages 1021–1034, 2018.
• P. J. Huber. Robust estimation of a location parameter. Ann. Math. Statist., 35(1):73–101, 03 1964.
• D. M. Kane. Robust covariance estimation. Talk given at TTIC Workshop on Computational Efficiency and High-Dimensional Robust Statistics, 2018. Available at http://www.iliasdiakonikolas.org/tti-robust/Kane-Covariance.pdf.
• A. Klivans, P. Kothari, and R. Meka. Efficient algorithms for outlier-robust regression. In Proc. 31st Annual Conference on Learning Theory (COLT), pages 1420–1430, 2018.
• P. K. Kothari, J. Steinhardt, and D. Steurer. Robust moment estimation and improved clustering via sum of squares. In Proc. 50th Annual ACM Symposium on Theory of Computing (STOC), pages 1035–1046, 2018.
• K. A. Lai, A. B. Rao, and S. Vempala. Agnostic estimation of mean and covariance. In Proc. 57th IEEE Symposium on Foundations of Computer Science (FOCS), pages 665–674, 2016.
• L. Liu, Y. Shen, T. Li, and C. Caramanis. High dimensional robust sparse regression. arXiv preprint arXiv:1805.11643, 2018.
• G. Lotti and F. Romani. On the asymptotic complexity of rectangular matrix multiplication. Theor. Comp. Sci., 23:171–185, 1983.
• J. Novembre, T. Johnson, K. Bryc, Z. Kutalik, A. R. Boyko, A. Auton, A. Indap, K. S. King, S. Bergmann, M. R. Nelson, et al. Genes mirror geography within europe. Nature, 456(7218): 98–101, 2008.
• R. Pagh, M. Stockel, and D. P. Woodruff. Is min-wise hashing optimal for summarizing set intersection? In Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), pages 109–120, 2014.
• R. Peng, K. Tangwongsan, and P. Zhang. Faster and simpler width-independent parallel algorithms for positive semidefinite programming. arXiv preprint arXiv:1201.5135v3, 2016.
• A. Prasad, A. S. Suggala, S. Balakrishnan, and P. Ravikumar. Robust estimation via robust gradient estimation. arXiv preprint arXiv:1802.06485, 2018.
• P. Rousseeuw. Multivariate estimation with high breakdown point. Mathematical Statistics and Applications, pages 283–297, 1985.
• J. Steinhardt, M. Charikar, and G. Valiant. Resilience: A criterion for learning in the presence of arbitrary outliers. In Proc. 9th Innovations in Theoretical Computer Science Conference (ITCS), pages 45:1–45:21, 2018.
• J. W. Tukey. Mathematics and picturing of data. In Proceedings of ICM, volume 6, pages 523–531, 1975.
• D. P. Woodruff. Sketching as a tool for numerical linear algebra. Foundations and Trends in Theoretical Computer Science, 10(1-2):1–157, 2014.
0

Tags