Optimizations of H-matrix-vector Multiplication for Modern Multi-core Processors

2022 IEEE International Conference on Cluster Computing (CLUSTER)(2022)

引用 0|浏览8
暂无评分
摘要
Hierarchical matrices (H-matrices) can robustly approximate the dense matrices that appear in the boundary element method (BEM). To accelerate the solving of linear systems in the BEM, we must speed up the matrix-vector multiplication in the iterative linear solver. However, speed-up approaches are usually developed for dense or sparse matrices, and are rarely reported for hierarchical matrix-vector multiplication (HiMV). The HiMV algorithm generates a large number of matrix-vector multiplications, which have not been sufficiently discussed. Therefore, the efficiency of HiMV has not reached its potential. This paper discusses optimization methodologies of HiMV for modern multi-core CPUs: an H-matrix storage method for efficient memory access, a method that avoids write contentions during reduction operations on the solution vector, an inter-thread load-balancing method, and blocking and sub-matrix sorting methods for cache efficiency. We demonstrate that these optimizations significantly improve the performance of modern CPU-based supercomputers. Relative to the target performance of dense matrix-vector multiplication (DGEMV), the HiMV flops reached 84.8%, 100.7%, and 98.7% during single-socket execution on the A64FX, AMD EPYC, and Intel Xeon Cascade Lake processors, respectively. Optimization of memory performance and cache efficiency is especially important for the A64FX with high-speed high-bandwidth memory.
更多
查看译文
关键词
Parallel computing,hierarchical matrices,per-formance evaluations,A64FX,AMD Rome,Intel Xeon Cascade-Lake
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要