Performance Evaluation of Lattice Boltzmann Method for Fluid Simulation on A64FX Processor and Supercomputer Fugaku.
HPC Asia(2022)
摘要
The lattice Boltzmann method has recently become popular as an alternative to Navier-Stokes solvers for large-scale fluid simulations. We conduct a performance study of the lattice Boltzmann method on the A64FX Arm-based processor of the supercomputer Fugaku. We compared four types of data layouts: SoA, AoS, Clusterd SoA (CSoA), and CSoA2, and three algorithms for the LBM streaming step: Pull, Push, and Swap schemes. The performance measurement on a single CMG (Core Memory Group) shows that the combination of the CSoA2 layout and the Swap scheme has the highest performance of 176 GFLOP, which corresponds to 11.5% of the single-precision peak performance. Our simulations have demonstrated good weak scaling up to 16,384 nodes and achieved high performance 10.9 PFLOPS in single precision. The strong scalability is also a good result, with parallel efficiencies of 63.9%, 68.3% and 72.7 % for the D3Q15, D3Q19 and D3Q27 velocity model, respectively when scaling from 512 to 16,384 nodes.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要