Mercury: A Simple Transport Layer Scheduler to Accelerate Distributed DNN Training

IEEE Conference on Computer Communications (INFOCOM)(2022)

引用 7|浏览41
暂无评分
摘要
Communication scheduling is crucial to improve the efficiency of training large deep learning models with data parallelism, in which the transmission order of layer-wise deep neural network (DNN) tensors is determined for a better computation-communication overlap. Prior approaches adopt tensor partitioning to enhance the priority scheduling with finer granularity. However, a startup time slot inserted before each tensor partition will neutralize this scheduling gain. Tuning the optimal partition size is difficult and the application-layer solutions cannot eliminate the partitioning overhead. In this paper, we propose Mercury, a simple transport layer scheduler that does not partition the tensors, but moves the priority scheduling to the transport layer at the packet granularity. The packets with the highest priority in the Mercury buffer will be transmitted first. Mercury achieves the near-optimal overlapping between communication and computation. It leverages immediate aggregation at the transport layer to enable the coincident gradient push and parameter pull. We implement Mercury in MXNet and conduct comprehensive experiments on five DNN models in an 8-node cluster with 10Gbps Ethernet. Experimental results show that Mercury can achieve about 1.18 ~ 2.18 × speedup over vanilla MXNet, and 1.08 ~ 2.04× speedup over the state-of-the-art tensor partitioning solution.
更多
查看译文
关键词
simple transport layer scheduler,training
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要