swCPD - Optimizing Canonical Polyadic Decomposition on Sunway Manycore Architecture.
Canonical Polyadic Decomposition (CPD) is one of the most popular methods in tensor decomposition and plays an important role in big data analysis. For sparse tensor, the major computation procedure in CPD, known as matricized tensor times Khatri-Rao product (MTTKRP), exhibits discontiguous memory access and becomes the performance bottleneck from achieving high performance on emerging processor architectures. In this paper, we propose swCPD, an efficient CPD implementation on the many-core Sunway architecture. The main idea adopted in swCPD is a hierarchical partitioning mechanism. From the computation perspective, the 64 CPEs are divided into eight groups, each group contains seven workers and one controller. From the data perspective, we partition the sparse tensor into different granularities including blocks, bands and tiles. Moreover, we develop a communication mechanism through register communication for cooperation between CPEs. We evaluate our implementation with both synthesized and real-world datasets, which achieves better performance than two cutting-edge CPD implementations.更多