Toward Accelerated Stencil Computation by Adapting Tensor Core Unit on GPU

Xiaoyan Liu,Yi Liu,Hailong Yang,Jianjin Liao,Mingzhen Li,Zhongzhi Luan,Depei Qian

International Conference on Supercomputing（2022）

Cited 5|Views61

No score

Abstract

The Tensor Core Unit (TCU) has been increasingly adopted on modern high performance processors, specialized in boosting the performance of general matrix multiplication (GEMM). Due to its highly optimized hardware design, TCU can significantly accelerate GEMM-based operations widely used in scientific as well as deep learning applications. However, there is few work exploiting TCU to accelerate non-GEMM operations such as stencil computation that is also important in the field of high performance computing. To the best of our knowledge, there is no previous work that adapts stencil computation to TCU efficiently by considering its unique characteristics. In this paper, we propose a new method called TCstencil to adapt TCU for accelerating stencil computation. Specifically, we re-design the stencil computation as a series of reduction and summation operations in order to leverage the computing power of TCU. In addition, we propose corresponding optimizations for better exploiting TCU and memory hierarchy on GPU. We evaluate our method with different stencils and input mesh sizes on NVIDIA A100 and V100 GPUs. The experiment results demonstrate our method can achieve superior performance compared to the state-of-the-art stencil optimization frameworks.

Translated text

Key words

stencil computation, performance optimization, tensor core, GPU

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined