A Hierarchical Communication Algorithm for Distributed Deep Learning Training.

Jiayu Zhang, Shaojun Cheng,Feng Dong, Ke Chen, Yong Qiao,Zhigang Mao,Jianfei Jiang

Midwest Symposium on Circuits and Systems(2023)

引用 0|浏览0
暂无评分
摘要
Distributed deep learning training nowadays has become an important workload on data center GPU clusters. However, in some cases, the inter-node bandwidth is limited (e.g., 20Gbps) and thus becomes a performance bottleneck for existing deep learning systems to scale deep learning training across multiple nodes. To exploit this insight, we propose a hierarchical communication algorithm combined with Asynchronous SGD and Synchronous SGD named AS-SGD to make full use of both inter-node and intra-node network bandwidth. Moreover, a set of system optimization techniques like quantization and decentralization are applied to further reduce communication costs. Finally, we present a performance evaluation of our algorithm on a 4-node cluster (each node with 8 Nvidia Tesla V100 GPUs). Experiments show that our algorithm achieves up to 4.95X speedup than existing state-of-the-art systems on popular deep learning models and datasets.
更多
查看译文
关键词
Deep Learning,Distributed Training,Computer Network
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要