Runtime Composition of Iterations for Fusing Loop-carried Sparse Dependence.

Kazem Cheshmi,Michelle Strout,Maryam Mehri Dehnavi

SC（2023）

引用 0|浏览4

暂无评分

摘要

Dependence between iterations in sparse computations causes inefficient use of memory and computation resources. This paper proposes sparse fusion, a technique that generates efficient parallel code for the combination of two sparse matrix kernels, where at least one of the kernels has loop-carried dependencies. Existing implementations optimize individual sparse kernels separately. However, this approach leads to synchronization overheads and load imbalance due to the irregular dependence patterns of sparse kernels, as well as inefficient cache usage due to their irregular memory access patterns. Sparse fusion uses a novel inspection strategy and code transformation to generate parallel fused code optimized for data locality and load balance. Sparse fusion outperforms the best of unfused implementations using ParSy and MKL by an average of 4.2× and is faster than the best of fused implementations using existing scheduling algorithms, such as LBC, DAGP, and wavefront by an average of 4× for various kernel combinations.

查看译文

关键词

Parallelization,Sparse Matrix,Load Balancing,Scheduling Algorithm,Parallel Efficiency,Load Imbalance,Symmetric Matrix,Positive Definite Matrix,Definite Matrix,Directed Acyclic Graph,Triangular Matrix,Temporal Localization,Upper Triangular,Lower Triangular,Loop Iteration,Breadth-first Search,Iterative Solver,Pairing Scheme,Running Example,Workload Balance,Parallel Loops,Chordal

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要