Accelerating Distributed DNN Training via Transport Layer Scheduling

Qingyang Duan,Chao Peng,Zeqin Wang,Yuedong Xu,Shaoteng Liu,Jun Wu,John C. S. Lui

IEEE Transactions on Parallel and Distributed Systems（2023）

引用 0|浏览58

暂无评分

摘要

Communication scheduling is crucial to accelerate the training of large deep learning models, in which the transmission order of layer-wise deep neural network (DNN) tensors is determined for a better computation-communication overlap. Prior approaches adopt user-level tensor partitioning to enhance the priority scheduling with finer granularity. However, a startup time slot inserted before every tensor partition will neutralize this scheduling gain. Tuning hyper-parameters for tensor partitioning is difficult, especially when the network bandwidth is shared or time-varying in multi-tenant clusters. In this article, we propose Mercury, a simple transport layer scheduler that moves the priority scheduling to the transport layer at the packet granularity. The packets with the highest priority in the Mercury buffer will be transmitted first. Mercury achieves the near-optimal overlapping between communication and computation. It also leverages the immediate aggregation at the transport layer to enable the full overlapping of gradient push and pull. We implement Mercury in MXNet and conduct comprehensive experiments on five popular DNN models in various environments. Mercury can well adapt to dynamic communication and computation resources. Experiments show that Mercury accelerates the training by up to 130% compared to the classical PS architecture, and 104% compared to state-of-the-art tensor partitioning methods.

查看译文

关键词

Computation-communication overlap,distri- buted machine learning,parameter server,transport layer scheduling

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要