Dynamic Tensor Linearization and Time Slicing for Efficient Factorization of Infinite Data Streams.

Yongseok Soh,Ahmed E. Helal,Fabio Checconi,Jan Laukemann,Jesmin Jahan Tithi,Teresa M. Ranadive,Fabrizio Petrini,Jee W. Choi

IPDPS（2023）

引用 0|浏览16

暂无评分

摘要

Streaming tensor factorization is an effective tool for unsupervised analysis of time-evolving sparse data, which emerge in many critical domains such as cybersecurity and trend analysis. In contrast to traditional tensors, time-evolving tensors demonstrate extreme sparsity and sparsity variation over time, resulting in irregular memory access and inefficient use of parallel computing resources. Additionally, due to the prohibitive cost of dynamically generating compressed sparse tensor formats, the state-of-the-art approaches process streaming tensors in a raw form that fails to capture data locality and suffers from high synchronization cost. To address these challenges, we propose a new dynamic tensor linearization framework that quickly encodes streaming multi-dimensional data on-the-fly in a compact representation, which has substantially lower memory usage and higher data reuse and parallelism than the original raw data. This is achieved by using a spatial sketching algorithm that keeps all incoming nonzero elements but remaps them into a tensor sketch with considerably reduced multi-dimensional image space. Moreover, we present a dynamic time slicing mechanism that uses variable-width time slices (instead of the traditional fixed-width) to balance the frequency of factor updates and the utilization of computing resources. We demonstrate the efficacy of our framework by accelerating two high-performance streaming tensor algorithms, namely, CP-stream and spCP-stream, and significantly improve their performance for a range of real-world streaming tensors. On a modern 56-core CPU, our framework achieves 10.3- 11x and 6.4- 7.2x geometric-mean speedup for the CP-stream and spCP-stream algorithms, respectively.

查看译文

关键词

Sparse tensors,tensor factorization,streaming data,multi-core CPU

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要