A Hierarchical Communication Algorithm for Distributed Deep Learning Training.

Jiayu Zhang, Shaojun Cheng,Feng Dong, Ke Chen, Yong Qiao,Zhigang Mao,Jianfei Jiang

Midwest Symposium on Circuits and Systems（2023）

引用 0|浏览0

暂无评分

摘要

Distributed deep learning training nowadays has become an important workload on data center GPU clusters. However, in some cases, the inter-node bandwidth is limited (e.g., 20Gbps) and thus becomes a performance bottleneck for existing deep learning systems to scale deep learning training across multiple nodes. To exploit this insight, we propose a hierarchical communication algorithm combined with Asynchronous SGD and Synchronous SGD named AS-SGD to make full use of both inter-node and intra-node network bandwidth. Moreover, a set of system optimization techniques like quantization and decentralization are applied to further reduce communication costs. Finally, we present a performance evaluation of our algorithm on a 4-node cluster (each node with 8 Nvidia Tesla V100 GPUs). Experiments show that our algorithm achieves up to 4.95X speedup than existing state-of-the-art systems on popular deep learning models and datasets.

查看译文

关键词

Deep Learning,Distributed Training,Computer Network

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要