Accelerating Deep Learning Using Interconnect-Aware UCX Communication for MPI Collectives

Yltan Hassan Temucin,Amir Hossein Sojoodi,Pedram Alizadeh, Benjamin Kitor,Ahmad Afsahi

IEEE Micro（2022）

引用 4|浏览3

暂无评分

摘要

Deep learning workloads on modern multi-graphics processing unit (GPU) nodes are highly dependent on intranode interconnects, such as NVLink and PCIe, for high-performance communication. In this article, we take on the challenge to design an interconnect-aware multipath GPU-to-GPU communication using unified communication X (UCX) to utilize all available bandwidth for both NVLink-based systems and...

查看译文

关键词

Graphics processing units,Bandwidth,Deep learning,Data transfer,Topology,Sockets,Runtime

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要