Accelerating matrix-centric graph processing on GPUs through bit-level optimizations.

Jou-An Chen,Hsin-Hsuan Sung,Xipeng Shen,Nathan R. Tallent,Kevin J. Barker,Ang Li

J. Parallel Distributed Comput.（2023）

引用 0|浏览44

暂无评分

摘要

Even though it is well known that binary values are common in graph applications (e.g., adjacency matrix), how to leverage the phenomenon for efficiency has not yet been adequately explored. This paper presents a systematic study on how to unlock the potential of the bit-level optimizations of graph computations that involve binary values. It proposes a two-level representation named Bit-Block Compressed Sparse Row (B2SR) and presents a series of optimizations to the graph operations on B2SR by the intrinsics of modern GPUs. It additionally introduces Deep Reinforcement Learning (DRL) as an efficient way to best configure the bit-level optimizations on the fly. The DQN-based adaptive tile size selector with dedicated model training can reach 68% prediction accuracy. Evaluations on the NVIDIA Pascal and Volta GPUs show that the optimizations bring up to 40x and 6555x for essential GraphBLAS kernels SpMV and SpGEMM, respectively, accelerating GraphBLAS-based BFS by up to 433x, SSSP, PR, and CC 35x, and TC 52x. (c) 2023 Elsevier Inc. All rights reserved.

查看译文

关键词

GraphBLAS,Bit manipulation,GPU,Sparse matrix,Deep reinforcement learning

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要