On Implementing Sparse Matrix Multi-vector Multiplication on GPUs

HPCC/CSS/ICESS(2014)

引用 4|浏览24
暂无评分
摘要
Sparse matrix-vector and multi-vector multiplications (SpMV and SpMM) are performance bottlenecks operations in numerous HPC applications. A variety of SpMV GPU kernels using different matrix storage formats have been developed to accelerate these applications. Unlike SpMV, where matrix elements are accessed only once, multiplying by k vectors requires accessing matrix elements k times. In this paper we explore the design of efficient GPU SpMM kernels that exploit two common matrix sparsity patterns, diagonal matrices and matrices with uniform row lengths. Our kernels use GPU registers to exploit the potential data reuse in SpMM. For evaluating the performance of our SpMM kernels we use 28 structured matrices and 29 matrices with uniform row lengths. Executing on the NVIDIAs Kepler-based Tesla K20 GPU and for structured grid matrices, the average speedup over the best performing state of the art SpMV kernel including NVIDIA's kernels is 2.3x and the maximum is 4.6x. For unstructured mesh matrices, the average speedup is 2.6x and the maximum is 4.1x. Compared to NVIDIA's cuSPARSE SpMM kernel the average speedup is 1.8x and the maximum is 2.4x for structured grid matrices. For unstructured mesh matrices, the average speedup is 1.5x and the maximum is 2.6x.
更多
查看译文
关键词
kernel,vectors,sparse matrices,instruction sets,registers
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要