Adaptive Sketching for Fast and Convergent Canonical Polyadic Decomposition

ICML, pp. 3566-3575, 2020.

Cited by: 0|Views16
EI
Weibo:
We focus on the CANDECOMP/PARAFAC/canonical polyadic decomposition, a generalization of the matrix singular value decomposition that uncovers a small set of latent factors describing each mode, or independent dimension, of the tensor

Abstract:

This work considers the canonical polyadic decomposition (CPD) of tensors using proximally regularized sketched alternating least squares algorithms. First, it establishes a sublinear rate of convergence for proximally regularized sketched CPD algorithms under two natural conditions that are known to be satisfied by many popular forms of ...More

Code:

Data:

0
Full Text
Bibtex
Weibo
Introduction
Highlights
  • Tensors of ever larger sizes appear with growing frequency in many applications including data mining (Papalexakis et al, 2017), signal processing (Cichocki et al, 2015), video analysis (Sobral et al, 2015), and more (Fanaee-T & Gama, 2016)
  • We focus on the CANDECOMP/PARAFAC/canonical polyadic decomposition (CPD) (Bro, 1997), a generalization of the matrix singular value decomposition that uncovers a small set of latent factors describing each mode, or independent dimension, of the tensor
  • We focus on sketched canonical polyadic decomposition-Alternating least squares algorithms in particular, for which no prior works address the important question of hyperparameter selection, and no prior works establish that these algorithms converge or even that the approximation error decreases
  • Canonical polyadic decomposition-MWU as set forth in Algorithm 1 uses proximal regularization, previous research has shown that Tikhonov regularization is effective for accelerating the decomposition of noisy tensors when used with sketching (Aggour et al, 2018)
  • We investigated the performance of canonical polyadic decomposition-MWU when Tikhonov regularization is substituted for proximal regularization, comparing to the performance of standard Alternating least squares and Sketched canonical polyadic decomposition
  • These regularization and sketching parameters were chosen for Sketched canonical polyadic decomposition by conducting an expensive grid search to identify a pairing that gives the fastest runtime and lowest residual error
Results
  • The authors investigated the performance of CPD-MWU when Tikhonov regularization is substituted for proximal regularization, comparing to the performance of standard ALS and Sketched CPD.
  • These regularization and sketching parameters were chosen for Sketched CPD by conducting an expensive grid search to identify a pairing that gives the fastest runtime and lowest residual error.
Conclusion
  • Conclusions & Future Work

    This work establishes the sublinear convergence rate of sketched CPD-ALS algorithms, and introduces CPD-MWU, a regularized, sketched CPD-ALS algorithm that dynamically selects the sketching rate to balance computational efficiency and decomposition accuracy.
  • This work establishes the sublinear convergence rate of sketched CPD-ALS algorithms, and introduces CPD-MWU, a regularized, sketched CPD-ALS algorithm that dynamically selects the sketching rate to balance computational efficiency and decomposition accuracy
Summary
  • Introduction:

    Tensors of ever larger sizes appear with growing frequency in many applications including data mining (Papalexakis et al, 2017), signal processing (Cichocki et al, 2015), video analysis (Sobral et al, 2015), and more (Fanaee-T & Gama, 2016)
  • Many of these applications use low-rank decompositions as a fundamental primitive in extracting latent factors from these tensorial datasets.
  • The corresponding rank-R CPD decomposition is (Kolda & Bader, 2009)
  • Results:

    The authors investigated the performance of CPD-MWU when Tikhonov regularization is substituted for proximal regularization, comparing to the performance of standard ALS and Sketched CPD.
  • These regularization and sketching parameters were chosen for Sketched CPD by conducting an expensive grid search to identify a pairing that gives the fastest runtime and lowest residual error.
  • Conclusion:

    Conclusions & Future Work

    This work establishes the sublinear convergence rate of sketched CPD-ALS algorithms, and introduces CPD-MWU, a regularized, sketched CPD-ALS algorithm that dynamically selects the sketching rate to balance computational efficiency and decomposition accuracy.
  • This work establishes the sublinear convergence rate of sketched CPD-ALS algorithms, and introduces CPD-MWU, a regularized, sketched CPD-ALS algorithm that dynamically selects the sketching rate to balance computational efficiency and decomposition accuracy
Tables
  • Table1: Average and standard deviation of runtime (in seconds)
  • Table2: Average residual error across 30 ill-conditioned tensor decompositions per data point, in which the runtime was fixed to
  • Table3: Table 3
  • Table4: Average and standard deviation of runtime (in seconds) and residual error across 10 decompositions of the NELL knowledge base extract after up to 30 minutes
  • Table5: Average and standard deviation of runtime (in seconds) and residual error across 10 decompositions of the NELL knowledge base extract after up to 2 hours
Download tables as Excel
Related work
Reference
  • Acar, E., Dunlavy, D. M., and Kolda, T. G. A scalable optimization approach for fitting canonical tensor decompositions. J. Chemometrics, 25(2):67–86, 2011.
    Google ScholarLocate open access versionFindings
  • Hadoop. In IEEE Int. Conf. Big Data (Big Data), pp. 294–301, December 2016. doi: 10.1109/BigData.2016. 7840615.
    Findings
  • Aggour, K. S., Gittens, A. A. T., and Yener, B. Accelerating a distributed CPD algorithm for large dense, skewed tensors. In IEEE Int. Conf. Big Data (Big Data), pp. 408–417, December 2018.
    Google ScholarLocate open access versionFindings
  • Battaglino, C., Ballard, G., and Kolda, T. G. A practical randomized CP tensor decomposition. SIAM J. Matrix Anal. Appl., 39(2):876–901, 2018. doi: 10.1137/17m1112303.
    Locate open access versionFindings
  • Beck, A. First-Order Methods in Optimization. Society for Industrial and Applied Mathematics, Philadephia, PA, 2017.
    Google ScholarLocate open access versionFindings
  • Bhojanapalli, S. and Sanghavi, S. A new sampling technique for tensors. 2015. URL arXiv:1502.05023[stat. ML].
    Findings
  • Bro, R. PARAFAC. Tutorial and applications. Chemometrics Intell. Laboratory Syst., 38(2):149–171, October 199doi: 10.1016/s0169-7439(97)00032-4.
    Locate open access versionFindings
  • Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka, Jr., E. R., and Mitchell, T. M. Toward an architecture for never-ending language learning. In Proc. 24th AAAI Conf. Artificial Intell., AAAI’10, pp. 1306–1313, July 2010.
    Google ScholarLocate open access versionFindings
  • Cesa-Bianchi, N. and Gábor, L. Prediction, Learning, and Games. Cambridge Univ. Press, Cambridge, UK, 2006.
    Google ScholarFindings
  • Cheng, D., Peng, R., Liu, Y., and Perros, I. SPALS: Fast alternating least squares via implicit leverage scores sampling. In Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, I., and Garnett, R. (eds.), Proc. 29th Adv. Neural Info. Process. Syst. (NIPS), pp. 721–729. Curran Associates, Inc., 2016.
    Google ScholarLocate open access versionFindings
  • Cichocki, A., Mandic, D., De Lathauwer, L., Zhou, G., Zhao, Q., Caiafa, C., and Phan, A.-H. Tensor decompositions for signal processing applications: From two-way to multiway component analysis. IEEE Signal Process. Mag., 32(2):145–163, March 2015. doi: 10.1109/msp.2013.2297439.
    Locate open access versionFindings
  • Fanaee-T, H. and Gama, J. Tensor-based anomaly detection: An interdisciplinary survey. Knowl.-Based Syst., 98: 130–147, 2016. doi: 10.1016/j.knosys.2016.01.027.
    Locate open access versionFindings
  • Fu, X., Gao, C., Wai, H.-T., and Huang, K. Blockrandomized stochastic proximal gradient for constrained low-rank tensor factorization. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7485–7489. IEEE, 2019.
    Google ScholarLocate open access versionFindings
  • Gujral, E., Pasricha, R., and Papalexakis, E. E. SamBaTen: Sampling-based batch incremental tensor decomposition. In Proc. SIAM Int. Conf. Data Mining (SDM), pp. 387– 395, May 2018.
    Google ScholarLocate open access versionFindings
  • Kang, U., Papalexakis, E. E., Harpale, A., and Faloutsos, C. GigaTensor: Scaling tensor analysis up by 100 times algorithms and discoveries. In Proc. 18th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining (KDD), pp. 316–324, 2012.
    Google ScholarLocate open access versionFindings
  • Kiers, H. A. L., ten Berge, J. M. F., and Bro, R. PARAFAC2 - Part I. a direct fitting algorithm for the PARAFAC2 model. J. Chemometrics, 13(3-4):275—-294, 1999.
    Google ScholarLocate open access versionFindings
  • Kolda, T. G. and Bader, B. W. Tensor decompositions and applications. SIAM Rev., 51(3):455–500, August 2009. ISSN 0036-1445. doi: 10.1137/07070111X.
    Locate open access versionFindings
  • Li, N., Kindermann, S., and Navasca, C. Some convergence results on the regularized alternating least-squares method for tensor decomposition. Linear Algebra Appl., 438(2): 796–812, January 2013. doi: 10.1016/j.laa.2011.12.002.
    Locate open access versionFindings
  • Nguyen, N. H., Drineas, P., and Tran, T. D. Tensor sparsification via a bound on the spectral norm of random tensors. Information and Inference: A Journal of the IMA, 4(3):195–229, 2015.
    Google ScholarLocate open access versionFindings
  • Papalexakis, E. E., Faloutsos, C., and Sidiropoulos, N. D. ParCube: Sparse parallelizable tensor decompositions. In Joint Eur. Conf. Mach. Learn. Knowl. Discovery Databases (ECMLKDD), pp. 521–536, 2012.
    Google ScholarLocate open access versionFindings
  • Papalexakis, E. E., Faloutsos, C., and Sidiropoulos, N. D. Tensors for data mining and data fusion: Models, applications, and scalable algorithms. ACM Trans. Intell. Syst. Technol. (TIST), 8(2):16:1–16:44, January 2017.
    Google ScholarLocate open access versionFindings
  • Reynolds, M. J., Doostan, A., and Beylkin, G. Randomized alternating least squares for canonical tensor decompositions: Application to a PDE with random data. SIAM J. Scientific Comput., 38(5):2634–2664, 2016. doi: 10.1137/15m1042802.
    Locate open access versionFindings
  • Sidiropoulos, N. D., Papalexakis, E. E., and Faloutsos, C. A parallel algorithm for big tensor decomposition using randomly compressed cubes (PARACOMP). In IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), pp. 1–5, May 2014. doi: 10.1109/ICASSP.2014.6853546.
    Locate open access versionFindings
  • Sobral, A., Javed, S., Jung, S. K., Bouwmans, T., and hadi Zahzah, E. Online stochastic tensor decomposition for background subtraction in multispectral video sequences. In IEEE Int. Conf. Comput. Vision Workshop (ICCVW), pp. 946–953, December 2015. doi: 10.1109/ICCVW. 2015.125.
    Locate open access versionFindings
  • Song, Z., Woodruff, D., and Zhang, H. Sublinear time orthogonal tensor decomposition. In Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, I., and Garnett, R. (eds.), Proc. 29th Adv. Neural Info. Process. Syst. (NIPS), pp. 793–801. Curran Associates, Inc., 2016.
    Google ScholarLocate open access versionFindings
  • Song, Z., Woodruff, D. P., and Zhong, P. Relative error tensor low rank approximation. Electron. Colloq. Comput. Complexity (ECCC), 25:103, 2018.
    Google ScholarLocate open access versionFindings
  • Tomasi, G. and Bro, R. A comparison of algorithms for fitting the PARAFAC model. Comput. Statist. Data Anal., 50(7):1700–1734, April 2006.
    Google ScholarLocate open access versionFindings
  • Tsourakakis, C. E. MACH: Fast randomized tensor decompositions. Proc. SIAM Int. Conf. Data Mining (SDM), pp. 689–700, 2010. doi: 10.1137/1.9781611972801.60.
    Locate open access versionFindings
  • Unknown. Man sitting on a bench, 2018. URL https://videos.pexels.com/videos/man-sitting-on-a-bench-853751. Accessed on Jan.10, 2019.
    Findings
  • Vervliet, N. and De Lathauwer, L. A randomized block sampling approach to canonical polyadic decomposition of large-scale tensors. IEEE J. Sel. Topics Signal Process., 10(2):284–295, March 2016. doi: 10.1109/jstsp.2015. 2503260.
    Locate open access versionFindings
  • Wang, S., Gittens, A., and Mahoney, M. W. Sketched ridge regression: optimization perspective, statistical perspective, and model averaging. The Journal of Machine Learning Research, 18(1):8039–8088, 2017.
    Google ScholarLocate open access versionFindings
  • Wang, Y., Tung, H.-Y., Smola, A. J., and Anandkumar, A. Fast and guaranteed tensor decomposition via sketching. In Cortes, C., Lawrence, N. D., Lee, D. D., Sugiyama, M., and Garnett, R. (eds.), Proc. 28th Adv. Neural Info. Process. Syst. (NIPS), pp. 991–999. Curran Associates, Inc., 2015.
    Google ScholarLocate open access versionFindings
  • Xu, Y. and Yin, W. A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion. SIAM Journal on Imaging Sciences, 6(3):1758–1789, 2013.
    Google ScholarLocate open access versionFindings
  • Yang, B., Zamzam, A., and Sidiropoulos, N. D. ParaSketch: Parallel tensor factorization via sketching. In Proc. SIAM Int. Conf. Data Mining (SDM), pp. 396–404, May 2018.
    Google ScholarLocate open access versionFindings
  • Yu, R., Purushotham, S., and Liu, Y. Efficient spatiotemporal sampling via low-rank tensor sketching. In Proc. Time Series Workshop at Conf. Neural Info. Process. Syst. (NIPS), pp. 5, 2015.
    Google ScholarLocate open access versionFindings
  • Zaharia, M., Chowdhury, M., Franklin, M. J., Shenker, S., and Stoica, I. Spark: Cluster computing with working sets. In Proc. 2nd USENIX Conf. Hot Topics Cloud Comput. (HotCloud), pp. 10–10, 2010.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments