A Hierarchical Communication Algorithm for Distributed Deep Learning Training.
Midwest Symposium on Circuits and Systems(2023)
摘要
Distributed deep learning training nowadays has become an important workload on data center GPU clusters. However, in some cases, the inter-node bandwidth is limited (e.g., 20Gbps) and thus becomes a performance bottleneck for existing deep learning systems to scale deep learning training across multiple nodes. To exploit this insight, we propose a hierarchical communication algorithm combined with Asynchronous SGD and Synchronous SGD named AS-SGD to make full use of both inter-node and intra-node network bandwidth. Moreover, a set of system optimization techniques like quantization and decentralization are applied to further reduce communication costs. Finally, we present a performance evaluation of our algorithm on a 4-node cluster (each node with 8 Nvidia Tesla V100 GPUs). Experiments show that our algorithm achieves up to 4.95X speedup than existing state-of-the-art systems on popular deep learning models and datasets.
更多查看译文
关键词
Deep Learning,Distributed Training,Computer Network
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要