Mixed precision applied on common mathematical procedures over GPU

Marcelo A. Sudo,Álvaro L. Fazenda,Roberto P. Souto

Anais do XXIII Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD 2022)（2022）

引用 0|浏览0

暂无评分

摘要

Approximate Computing is a paradigm used by researchers as alternative to the diminishing of the evolution of hardware performance in the ongoing race for computational throughput in HPC. Precision reduction and mixed precision are the most studied among the existing techniques. In addition, some NVIDIA GPUs have Tensor Core architecture to speed up some classes of algorithms, such as matrix multiplication. This study aims to apply Approximate Computing techniques, like mixed precision, in matrix multiplication and stencil algorithms using OpenACC directives and cuTensor library to analyze performance gains versus accuracy losses. Results showed that it was possible to obtain a speedup of 16.60× with an optimized matrix multiplication algorithm present in the matmul intrinsic function using 16-bit floating-point data with Tensor Core, compared to a naive version using 64-bit floating-point. For this same case, accuracy loss went from 10−26 up to 10−1, approximately. For the stencil algorithm, it was possible to obtain a gain of 1.60× by only reducing variables precision from 64-bit to 16-bit floating-point, with accuracy loss from 0 to 10−9, for 300 iterations.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要