Lattice Boltzmann for Large-Scale GPU Systems.

APPLICATIONS, TOOLS AND TECHNIQUES ON THE ROAD TO EXASCALE COMPUTING(2011)

引用 7|浏览3
暂无评分
摘要
We describe the enablement of the Ludwig lattice Boltzmann parallel fluid dynamics application, designed specifically for complex problems, for massively parallel GPU-accelerated architectures. NVIDIA CUDA is introduced into the existing C/MPI framework, and we have given careful consideration to maintainability in addition to performance. Significant performance gains are realised on each GPU through restructuring of the data layout to allow memory coalescing and the adaptation of key loops to reduce off-chip memory accesses. The halo-swap communication phase has been designed to efficiently utilise many GPUs in parallel: included is the overlapping of several stages using CUDA stream functionality. The new GPU adaptation is seen to retain the good scaling behaviour of the original CPU code, and scales well up to 256 NVIDIA Fermi GPUs (the largest resource tested). The performance on the NVIDIA Fermi GPU is observed to be up to a factor of 4 greater than the (12-core) AMD Magny-Cours CPU (with all cores utilised) for a binary fluid benchmark.
更多
查看译文
关键词
GPU,Lattice Boltzmann,CUDA,MPI,Optimisation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要