Optimizations of H-matrix-vector Multiplication for Modern Multi-core Processors

Tetsuya Hoshino,Akihiro Ida,Toshihiro Hanawa

2022 IEEE International Conference on Cluster Computing (CLUSTER)（2022）

引用 0|浏览8

暂无评分

摘要

Hierarchical matrices (H-matrices) can robustly approximate the dense matrices that appear in the boundary element method (BEM). To accelerate the solving of linear systems in the BEM, we must speed up the matrix-vector multiplication in the iterative linear solver. However, speed-up approaches are usually developed for dense or sparse matrices, and are rarely reported for hierarchical matrix-vector multiplication (HiMV). The HiMV algorithm generates a large number of matrix-vector multiplications, which have not been sufficiently discussed. Therefore, the efficiency of HiMV has not reached its potential. This paper discusses optimization methodologies of HiMV for modern multi-core CPUs: an H-matrix storage method for efficient memory access, a method that avoids write contentions during reduction operations on the solution vector, an inter-thread load-balancing method, and blocking and sub-matrix sorting methods for cache efficiency. We demonstrate that these optimizations significantly improve the performance of modern CPU-based supercomputers. Relative to the target performance of dense matrix-vector multiplication (DGEMV), the HiMV flops reached 84.8%, 100.7%, and 98.7% during single-socket execution on the A64FX, AMD EPYC, and Intel Xeon Cascade Lake processors, respectively. Optimization of memory performance and cache efficiency is especially important for the A64FX with high-speed high-bandwidth memory.

查看译文

关键词

Parallel computing,hierarchical matrices,per-formance evaluations,A64FX,AMD Rome,Intel Xeon Cascade-Lake

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要