# Input-Sparsity Low Rank Approximation in Schatten Norm

ICML 2020, 2020.

Keywords:

Weibo:

Abstract:

We give the first input-sparsity time algorithms for the rank-$k$ low rank approximation problem in every Schatten norm. Specifically, for a given $n\times n$ matrix $A$, our algorithm computes $Y,Z\in \mathbb{R}^{n\times k}$, which, with high probability, satisfy $\|A-YZ^T\|_p \leq (1+\epsilon)\|A-A_k\|_p$, where $\|M\|_p = \left (\sum...More

Code:

Data:

Introduction

- A common task in processing or analyzing large-scale datasets is to approximate a large matrix A ∈ Rm×n (m ≥ n) with a low-rank matrix.
- Schatten p-norm of a matrix with singular values σ1(M ), .
- It is a well-known fact (Mirsky’s Theorem) that the optimal solution for general Schatten norms coincides with the optimal rank-k matrix Ak for the Frobenius norm, given by the SVD.

Highlights

- A common task in processing or analyzing large-scale datasets is to approximate a large matrix A ∈ Rm×n (m ≥ n) with a low-rank matrix. Often this is done with respect to the Frobenius norm, that is, the objective function is to minimize the error A − X F over all rank-k matrices X ∈ Rm×n for a rank parameter k. It is well-known that the optimal solution is Ak = PLA = APR, where PL is the orthogonal projection onto the top k left singular vectors of A, and PR is the orthogonal projection onto the top k right singular vectors of A
- A number of efficient methods are known, which are based on dimensionality reduction techniques such as random projections, importance sampling, and other sketching methods, with running times[1,2] O(nnz(A) + m poly(k/ε)), where nnz(A) denotes the number of non-zero entries of A. This is significantly faster than the singular value decomposition, which takes Θ time, where ω is the exponent of matrix multiplication
- Our goal is to find an orthogonal projection Qfor which A(I − Q ) 1 ≤ (1 + O(ε)) A − Ak 1
- In addition to the solution provided by our algorithm, we consider a natural candidate for a low-rank approximation algorithm, which is the solution in Frobenius norm, that is, a rank-k matrix X for which A − X F ≤ (1 + ε) A − Ak F
- Take S to be a Count-Sketch matrix and let Z be an n × k matrix whose columns form an orthonormal basis of the top-k right singular vectors of SA

Results

- The Frobenius On the other hand, if a Schatten 1-norm rank-k app√roxima√tion algorithm were to only output the top singular direction, it would pay√a cost of 2k · 1/ k = 2 k.
- No algorithms for low-rank approximation in the Schatten p-norm were known to run in time O(nnz(A)+m poly(k/ε)) prior to this work, except for the special case of p = 2.
- The authors' Contributions In this paper the authors obtain the first provably efficient algorithms for the rank-k (1 + ε)-approximation problem with respect to the Schatten p-norm for every p ≥ 1.
- It was shown by Musco and Woodruff [MW17] that computing a constant-factor low-rank approximation to AT A, given only A, requires Ω(nnz(A) · k) time.
- There exists a randomized algorithm which runs in O(nnz(A) log n) + O time and outputs a matrix C of t = Θ(ε−2K log K) columns, which are rescaled column samples of A without replacement, such that with probability at least 0.99, (1 − ε)AAT − η
- The contribution of the work is primarily theoretical: an algorithm with a new and optimal runtime for low-rank approximation for any Schatten p-norm.
- In addition to the solution provided by the algorithm, the authors consider a natural candidate for a low-rank approximation algorithm, which is the solution in Frobenius norm, that is, a rank-k matrix X for which A − X F ≤ (1 + ε) A − Ak F .
- The authors report the median relative approximation error and the median running time of the algorithm and those of the Frobenius-norm algorithm among 50 independent runs for each value of k ∈ {5, 10, 20}.

Conclusion

- The authors' algorithm achieves a good approximation error, less than 0.015, and surpasses the approximate Frobenius-norm solution for all such values of k.
- One can ask the problem of low-rank approximation with respect to some function Φ on the matrix singular values, i.e., min Φ(A − X)
- The algorithm runs in time O(nnz(A)(k + log n)) + O(n poly(k/ε)), where the hidden constants depend on α, γ and the polynomial exponents for Kε and Lε.

Summary

- A common task in processing or analyzing large-scale datasets is to approximate a large matrix A ∈ Rm×n (m ≥ n) with a low-rank matrix.
- Schatten p-norm of a matrix with singular values σ1(M ), .
- It is a well-known fact (Mirsky’s Theorem) that the optimal solution for general Schatten norms coincides with the optimal rank-k matrix Ak for the Frobenius norm, given by the SVD.
- The Frobenius On the other hand, if a Schatten 1-norm rank-k app√roxima√tion algorithm were to only output the top singular direction, it would pay√a cost of 2k · 1/ k = 2 k.
- No algorithms for low-rank approximation in the Schatten p-norm were known to run in time O(nnz(A)+m poly(k/ε)) prior to this work, except for the special case of p = 2.
- The authors' Contributions In this paper the authors obtain the first provably efficient algorithms for the rank-k (1 + ε)-approximation problem with respect to the Schatten p-norm for every p ≥ 1.
- It was shown by Musco and Woodruff [MW17] that computing a constant-factor low-rank approximation to AT A, given only A, requires Ω(nnz(A) · k) time.
- There exists a randomized algorithm which runs in O(nnz(A) log n) + O time and outputs a matrix C of t = Θ(ε−2K log K) columns, which are rescaled column samples of A without replacement, such that with probability at least 0.99, (1 − ε)AAT − η
- The contribution of the work is primarily theoretical: an algorithm with a new and optimal runtime for low-rank approximation for any Schatten p-norm.
- In addition to the solution provided by the algorithm, the authors consider a natural candidate for a low-rank approximation algorithm, which is the solution in Frobenius norm, that is, a rank-k matrix X for which A − X F ≤ (1 + ε) A − Ak F .
- The authors report the median relative approximation error and the median running time of the algorithm and those of the Frobenius-norm algorithm among 50 independent runs for each value of k ∈ {5, 10, 20}.
- The authors' algorithm achieves a good approximation error, less than 0.015, and surpasses the approximate Frobenius-norm solution for all such values of k.
- One can ask the problem of low-rank approximation with respect to some function Φ on the matrix singular values, i.e., min Φ(A − X)
- The algorithm runs in time O(nnz(A)(k + log n)) + O(n poly(k/ε)), where the hidden constants depend on α, γ and the polynomial exponents for Kε and Lε.

- Table1: Performance of our algorithm on synthetic data compared with approximate Frobeniusnorm solution and the SVD
- Table2: Performance of our algorithm on KOS data compared with approximate Frobenius-norm solution

Funding

- Li was supported in part by Singapore Ministry of Education (AcRF) Tier 2 grant MOE2018-T2-1-013
- Woodruff was supported in part by Office of Naval Research (ONR) grant N00014-18-1-2562

Reference

- Michael B. Cohen, Sam Elder, Cameron Musco, Christopher Musco, and Madalina Persu. Dimensionality reduction for k-means clustering and low rank approximation. In Proceedings of the Forty-Seventh Annual ACM Symposium on Theory of Computing, STOC 15, pages 163–172, New York, NY, USA, 2015. Association for Computing Machinery.
- Michael B. Cohen, Yin Tat Lee, Cameron Musco, Christopher Musco, Richard Peng, and Aaron Sidford. Uniform sampling for matrix approximation. In Proceedings of the 2015 Conference on Innovations in Theoretical Computer Science, ITCS 15, pages 181–190, 2015.
- [CLMW11] Emmanuel J. Candes, Xiaodong Li, Yi Ma, and John Wright. Robust principal component analysis? J. ACM, 58(3), June 2011.
- Michael B. Cohen, Cameron Musco, and Christopher Musco. Input sparsity time lowrank approximation via ridge leverage score sampling. In Proceedings of the TwentyEighth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA’17, pages 1758– 1777, USA, 2017. Society for Industrial and Applied Mathematics.
- Kenneth L. Clarkson and David P. Woodruff. Low-rank approximation and regression in input sparsity time. J. ACM, 63(6), January 2017.
- [DDH07] James Demmel, Ioana Dumitriu, and Olga Holtz. Fast linear algebra is stable. Numer. Math., 108(1):59–91, October 2007.
- [LNW19] Yi. Li, Huy L. Nguyen, and David P. Woodruff. On approximating matrix norms in data streams. SIAM Journal on Computing, 48(6):1643–1697, 2019.
- [Mus18] Christopher Musco. Faster linear algebra for data analysis and machine learning. PhD thesis, MIT, 2018.
- Cameron Musco and David P. Woodruff. Is input sparsity time possible for kernel lowrank approximation? In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, pages 4438–4448, Red Hook, NY, USA, 2017. Curran Associates Inc.
- J. Nelson and H. L. Nguyen. OSNAP: Faster numerical linear algebra algorithms via sparser subspace embeddings. In 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, pages 117–126, Oct 2013.
- [Woo14] David P. Woodruff. Sketching as a tool for numerical linear algebra. Foundations and Trends in Theoretical Computer Science, 10(1–2):1–157, 2014.
- Huan Xu, Constantine Caramanis, and Sujay Sanghavi. Robust PCA via outlier pursuit. In J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23, pages 2496–2504. Curran Associates, Inc., 2010.
- [YPCC16] Xinyang Yi, Dohyung Park, Yudong Chen, and Constantine Caramanis. Fast algorithms for robust PCA via gradient descent. In Proceedings of the 30th International
- Conference on Neural Information Processing Systems, NIPS16, pages 4159–4167, Red Hook, NY, USA, 2016. Curran Associates Inc.

Tags

Comments