# Adaptive Sketching for Fast and Convergent Canonical Polyadic Decomposition

ICML, pp. 3566-3575, 2020.

EI

Weibo:

Abstract:

This work considers the canonical polyadic decomposition (CPD) of tensors using proximally regularized sketched alternating least squares algorithms. First, it establishes a sublinear rate of convergence for proximally regularized sketched CPD algorithms under two natural conditions that are known to be satisfied by many popular forms of ...More

Code:

Data:

Introduction

- Tensors of ever larger sizes appear with growing frequency in many applications including data mining (Papalexakis et al, 2017), signal processing (Cichocki et al, 2015), video analysis (Sobral et al, 2015), and more (Fanaee-T & Gama, 2016)
- Many of these applications use low-rank decompositions as a fundamental primitive in extracting latent factors from these tensorial datasets.
- The corresponding rank-R CPD decomposition is (Kolda & Bader, 2009)

Highlights

- Tensors of ever larger sizes appear with growing frequency in many applications including data mining (Papalexakis et al, 2017), signal processing (Cichocki et al, 2015), video analysis (Sobral et al, 2015), and more (Fanaee-T & Gama, 2016)
- We focus on the CANDECOMP/PARAFAC/canonical polyadic decomposition (CPD) (Bro, 1997), a generalization of the matrix singular value decomposition that uncovers a small set of latent factors describing each mode, or independent dimension, of the tensor
- We focus on sketched canonical polyadic decomposition-Alternating least squares algorithms in particular, for which no prior works address the important question of hyperparameter selection, and no prior works establish that these algorithms converge or even that the approximation error decreases
- Canonical polyadic decomposition-MWU as set forth in Algorithm 1 uses proximal regularization, previous research has shown that Tikhonov regularization is effective for accelerating the decomposition of noisy tensors when used with sketching (Aggour et al, 2018)
- We investigated the performance of canonical polyadic decomposition-MWU when Tikhonov regularization is substituted for proximal regularization, comparing to the performance of standard Alternating least squares and Sketched canonical polyadic decomposition
- These regularization and sketching parameters were chosen for Sketched canonical polyadic decomposition by conducting an expensive grid search to identify a pairing that gives the fastest runtime and lowest residual error

Results

- The authors investigated the performance of CPD-MWU when Tikhonov regularization is substituted for proximal regularization, comparing to the performance of standard ALS and Sketched CPD.
- These regularization and sketching parameters were chosen for Sketched CPD by conducting an expensive grid search to identify a pairing that gives the fastest runtime and lowest residual error.

Conclusion

**Conclusions & Future Work**

This work establishes the sublinear convergence rate of sketched CPD-ALS algorithms, and introduces CPD-MWU, a regularized, sketched CPD-ALS algorithm that dynamically selects the sketching rate to balance computational efficiency and decomposition accuracy.- This work establishes the sublinear convergence rate of sketched CPD-ALS algorithms, and introduces CPD-MWU, a regularized, sketched CPD-ALS algorithm that dynamically selects the sketching rate to balance computational efficiency and decomposition accuracy

Summary

## Introduction:

Tensors of ever larger sizes appear with growing frequency in many applications including data mining (Papalexakis et al, 2017), signal processing (Cichocki et al, 2015), video analysis (Sobral et al, 2015), and more (Fanaee-T & Gama, 2016)- Many of these applications use low-rank decompositions as a fundamental primitive in extracting latent factors from these tensorial datasets.
- The corresponding rank-R CPD decomposition is (Kolda & Bader, 2009)
## Results:

The authors investigated the performance of CPD-MWU when Tikhonov regularization is substituted for proximal regularization, comparing to the performance of standard ALS and Sketched CPD.- These regularization and sketching parameters were chosen for Sketched CPD by conducting an expensive grid search to identify a pairing that gives the fastest runtime and lowest residual error.
## Conclusion:

**Conclusions & Future Work**

This work establishes the sublinear convergence rate of sketched CPD-ALS algorithms, and introduces CPD-MWU, a regularized, sketched CPD-ALS algorithm that dynamically selects the sketching rate to balance computational efficiency and decomposition accuracy.- This work establishes the sublinear convergence rate of sketched CPD-ALS algorithms, and introduces CPD-MWU, a regularized, sketched CPD-ALS algorithm that dynamically selects the sketching rate to balance computational efficiency and decomposition accuracy

- Table1: Average and standard deviation of runtime (in seconds)
- Table2: Average residual error across 30 ill-conditioned tensor decompositions per data point, in which the runtime was fixed to
- Table3: Table 3
- Table4: Average and standard deviation of runtime (in seconds) and residual error across 10 decompositions of the NELL knowledge base extract after up to 30 minutes
- Table5: Average and standard deviation of runtime (in seconds) and residual error across 10 decompositions of the NELL knowledge base extract after up to 2 hours

Related work

- Early work on fast randomized tensor decomposition focused on entry-wise sparsification (Tsourakakis, 2010; Nguyen et al, 2015), then several groups investigated sketched ALS algorithms (Bhojanapalli & Sanghavi, 2015; Reynolds et al, 2016; Wang et al, 2015; Yu et al, 2015; Vervliet & De Lathauwer, 2016; Song et al, 2016; Cheng et al, 2016). More recently, two groups within the data mining community (Gujral et al, 2018; Yang et al, 2018) refined the earlier ParCube system (Papalexakis et al, 2012) that uses a block-sampling approach to enhance the scalability of CP decomposition, and (Battaglino et al, 2018) proposed two sketching approaches for ALS. Most of these works do not provide guarantees on the convergence to critical points of the CPD objective. One exception, (Wang et al, 2015), provides strong convergence guarantees for a sketched tensor power method, which (Cheng et al, 2016) argues is less efficient than sketched ALS.

Of these works, the most closely related to our approach are (Cheng et al, 2016; Battaglino et al, 2018; Aggour et al, 2018). In (Cheng et al, 2016), Cheng et al introduce the SPALS algorithm, which accelerates ALS by sampling rows of the Khatri-Rao product with probability proportional to their statistical leverage scores (Cheng et al, 2016). In (Battaglino et al, 2018), Battaglino et al propose two sketching approaches for CPD-ALS, CPRAND and CPRAND-MIX. CPRAND samples rows of the Khatri-Rao product uniformly at random; the CPRAND-MIX algorithm first mixes the modes of an input tensor to make the tensor incoherent, before applying CPRAND. In (Aggour et al, 2018), Aggour et al demonstrated that regularization works with sketching to further accelerate the convergence of ALS.

Reference

- Acar, E., Dunlavy, D. M., and Kolda, T. G. A scalable optimization approach for fitting canonical tensor decompositions. J. Chemometrics, 25(2):67–86, 2011.
- Hadoop. In IEEE Int. Conf. Big Data (Big Data), pp. 294–301, December 2016. doi: 10.1109/BigData.2016. 7840615.
- Aggour, K. S., Gittens, A. A. T., and Yener, B. Accelerating a distributed CPD algorithm for large dense, skewed tensors. In IEEE Int. Conf. Big Data (Big Data), pp. 408–417, December 2018.
- Battaglino, C., Ballard, G., and Kolda, T. G. A practical randomized CP tensor decomposition. SIAM J. Matrix Anal. Appl., 39(2):876–901, 2018. doi: 10.1137/17m1112303.
- Beck, A. First-Order Methods in Optimization. Society for Industrial and Applied Mathematics, Philadephia, PA, 2017.
- Bhojanapalli, S. and Sanghavi, S. A new sampling technique for tensors. 2015. URL arXiv:1502.05023[stat. ML].
- Bro, R. PARAFAC. Tutorial and applications. Chemometrics Intell. Laboratory Syst., 38(2):149–171, October 199doi: 10.1016/s0169-7439(97)00032-4.
- Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka, Jr., E. R., and Mitchell, T. M. Toward an architecture for never-ending language learning. In Proc. 24th AAAI Conf. Artificial Intell., AAAI’10, pp. 1306–1313, July 2010.
- Cesa-Bianchi, N. and Gábor, L. Prediction, Learning, and Games. Cambridge Univ. Press, Cambridge, UK, 2006.
- Cheng, D., Peng, R., Liu, Y., and Perros, I. SPALS: Fast alternating least squares via implicit leverage scores sampling. In Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, I., and Garnett, R. (eds.), Proc. 29th Adv. Neural Info. Process. Syst. (NIPS), pp. 721–729. Curran Associates, Inc., 2016.
- Cichocki, A., Mandic, D., De Lathauwer, L., Zhou, G., Zhao, Q., Caiafa, C., and Phan, A.-H. Tensor decompositions for signal processing applications: From two-way to multiway component analysis. IEEE Signal Process. Mag., 32(2):145–163, March 2015. doi: 10.1109/msp.2013.2297439.
- Fanaee-T, H. and Gama, J. Tensor-based anomaly detection: An interdisciplinary survey. Knowl.-Based Syst., 98: 130–147, 2016. doi: 10.1016/j.knosys.2016.01.027.
- Fu, X., Gao, C., Wai, H.-T., and Huang, K. Blockrandomized stochastic proximal gradient for constrained low-rank tensor factorization. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7485–7489. IEEE, 2019.
- Gujral, E., Pasricha, R., and Papalexakis, E. E. SamBaTen: Sampling-based batch incremental tensor decomposition. In Proc. SIAM Int. Conf. Data Mining (SDM), pp. 387– 395, May 2018.
- Kang, U., Papalexakis, E. E., Harpale, A., and Faloutsos, C. GigaTensor: Scaling tensor analysis up by 100 times algorithms and discoveries. In Proc. 18th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining (KDD), pp. 316–324, 2012.
- Kiers, H. A. L., ten Berge, J. M. F., and Bro, R. PARAFAC2 - Part I. a direct fitting algorithm for the PARAFAC2 model. J. Chemometrics, 13(3-4):275—-294, 1999.
- Kolda, T. G. and Bader, B. W. Tensor decompositions and applications. SIAM Rev., 51(3):455–500, August 2009. ISSN 0036-1445. doi: 10.1137/07070111X.
- Li, N., Kindermann, S., and Navasca, C. Some convergence results on the regularized alternating least-squares method for tensor decomposition. Linear Algebra Appl., 438(2): 796–812, January 2013. doi: 10.1016/j.laa.2011.12.002.
- Nguyen, N. H., Drineas, P., and Tran, T. D. Tensor sparsification via a bound on the spectral norm of random tensors. Information and Inference: A Journal of the IMA, 4(3):195–229, 2015.
- Papalexakis, E. E., Faloutsos, C., and Sidiropoulos, N. D. ParCube: Sparse parallelizable tensor decompositions. In Joint Eur. Conf. Mach. Learn. Knowl. Discovery Databases (ECMLKDD), pp. 521–536, 2012.
- Papalexakis, E. E., Faloutsos, C., and Sidiropoulos, N. D. Tensors for data mining and data fusion: Models, applications, and scalable algorithms. ACM Trans. Intell. Syst. Technol. (TIST), 8(2):16:1–16:44, January 2017.
- Reynolds, M. J., Doostan, A., and Beylkin, G. Randomized alternating least squares for canonical tensor decompositions: Application to a PDE with random data. SIAM J. Scientific Comput., 38(5):2634–2664, 2016. doi: 10.1137/15m1042802.
- Sidiropoulos, N. D., Papalexakis, E. E., and Faloutsos, C. A parallel algorithm for big tensor decomposition using randomly compressed cubes (PARACOMP). In IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), pp. 1–5, May 2014. doi: 10.1109/ICASSP.2014.6853546.
- Sobral, A., Javed, S., Jung, S. K., Bouwmans, T., and hadi Zahzah, E. Online stochastic tensor decomposition for background subtraction in multispectral video sequences. In IEEE Int. Conf. Comput. Vision Workshop (ICCVW), pp. 946–953, December 2015. doi: 10.1109/ICCVW. 2015.125.
- Song, Z., Woodruff, D., and Zhang, H. Sublinear time orthogonal tensor decomposition. In Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, I., and Garnett, R. (eds.), Proc. 29th Adv. Neural Info. Process. Syst. (NIPS), pp. 793–801. Curran Associates, Inc., 2016.
- Song, Z., Woodruff, D. P., and Zhong, P. Relative error tensor low rank approximation. Electron. Colloq. Comput. Complexity (ECCC), 25:103, 2018.
- Tomasi, G. and Bro, R. A comparison of algorithms for fitting the PARAFAC model. Comput. Statist. Data Anal., 50(7):1700–1734, April 2006.
- Tsourakakis, C. E. MACH: Fast randomized tensor decompositions. Proc. SIAM Int. Conf. Data Mining (SDM), pp. 689–700, 2010. doi: 10.1137/1.9781611972801.60.
- Unknown. Man sitting on a bench, 2018. URL https://videos.pexels.com/videos/man-sitting-on-a-bench-853751. Accessed on Jan.10, 2019.
- Vervliet, N. and De Lathauwer, L. A randomized block sampling approach to canonical polyadic decomposition of large-scale tensors. IEEE J. Sel. Topics Signal Process., 10(2):284–295, March 2016. doi: 10.1109/jstsp.2015. 2503260.
- Wang, S., Gittens, A., and Mahoney, M. W. Sketched ridge regression: optimization perspective, statistical perspective, and model averaging. The Journal of Machine Learning Research, 18(1):8039–8088, 2017.
- Wang, Y., Tung, H.-Y., Smola, A. J., and Anandkumar, A. Fast and guaranteed tensor decomposition via sketching. In Cortes, C., Lawrence, N. D., Lee, D. D., Sugiyama, M., and Garnett, R. (eds.), Proc. 28th Adv. Neural Info. Process. Syst. (NIPS), pp. 991–999. Curran Associates, Inc., 2015.
- Xu, Y. and Yin, W. A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion. SIAM Journal on Imaging Sciences, 6(3):1758–1789, 2013.
- Yang, B., Zamzam, A., and Sidiropoulos, N. D. ParaSketch: Parallel tensor factorization via sketching. In Proc. SIAM Int. Conf. Data Mining (SDM), pp. 396–404, May 2018.
- Yu, R., Purushotham, S., and Liu, Y. Efficient spatiotemporal sampling via low-rank tensor sketching. In Proc. Time Series Workshop at Conf. Neural Info. Process. Syst. (NIPS), pp. 5, 2015.
- Zaharia, M., Chowdhury, M., Franklin, M. J., Shenker, S., and Stoica, I. Spark: Cluster computing with working sets. In Proc. 2nd USENIX Conf. Hot Topics Cloud Comput. (HotCloud), pp. 10–10, 2010.

Tags

Comments