Toward Accelerated Stencil Computation by Adapting Tensor Core Unit on GPU

International Conference on Supercomputing(2022)

引用 4|浏览30
The Tensor Core Unit (TCU) has been increasingly adopted on modern high performance processors, specialized in boosting the performance of general matrix multiplication (GEMM). Due to its highly optimized hardware design, TCU can significantly accelerate GEMM-based operations widely used in scientific as well as deep learning applications. However, there is few work exploiting TCU to accelerate non-GEMM operations such as stencil computation that is also important in the field of high performance computing. To the best of our knowledge, there is no previous work that adapts stencil computation to TCU efficiently by considering its unique characteristics. In this paper, we propose a new method called TCstencil to adapt TCU for accelerating stencil computation. Specifically, we re-design the stencil computation as a series of reduction and summation operations in order to leverage the computing power of TCU. In addition, we propose corresponding optimizations for better exploiting TCU and memory hierarchy on GPU. We evaluate our method with different stencils and input mesh sizes on NVIDIA A100 and V100 GPUs. The experiment results demonstrate our method can achieve superior performance compared to the state-of-the-art stencil optimization frameworks.
AI 理解论文