Accelerating Deep Learning Using Interconnect-Aware UCX Communication for MPI Collectives

IEEE Micro(2022)

引用 4|浏览3
暂无评分
摘要
Deep learning workloads on modern multi-graphics processing unit (GPU) nodes are highly dependent on intranode interconnects, such as NVLink and PCIe, for high-performance communication. In this article, we take on the challenge to design an interconnect-aware multipath GPU-to-GPU communication using unified communication X (UCX) to utilize all available bandwidth for both NVLink-based systems and...
更多
查看译文
关键词
Graphics processing units,Bandwidth,Deep learning,Data transfer,Topology,Sockets,Runtime
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要