Efficient Personalized and Non-Personalized Alltoall Communication for Modern Multi-HCA GPU-Based Clusters.

Kaushik Kandadi Suresh, Akshay Paniraja Guptha,Benjamin Michalowicz,Bharath Ramesh,Mustafa Abduljabbar,Aamir Shafi,Hari Subramoni,Dhabaleswar K. Panda

HIPC（2022）

引用 0|浏览10

暂无评分

摘要

Graphics Processing Units (GPUs) have become ubiquitous in today's supercomputing clusters primarily because of their high compute capability and power efficiency. Message Passing Interface (MPI) is a widely adopted programming model for large-scale GPU-based applications used in such clusters. Modern GPU-based systems have multiple HCAs. Previously, scientists have leveraged multi-HCA systems to accelerate internode transfers between CPUs using point-to-point primitives. In this work, we show the need for collective-level, multi-rail aware algorithms using MPI Allgather as an example. We then propose an efficient multi-rail MPI Allgather algorithm and extend it to MPI Alltoall. We analyze the performance of this algorithm using OMB benchmark suite. We demonstrate approximately 30% and 43% improvement in non-personalized and personalized communication benchmarks respectively when compared with the state-of-the-art MPI libraries on 128 GPUs

查看译文

关键词

MPI,DDT,GPU,HCA,Multi-HCA

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要