Canary: Decentralized Distributed Deep Learning Via Gradient Sketch and Partition in Multi-Interface Networks

Qihua Zhou,Kun Wang,Haodong Lu,Wenyao Xu,Yanfei Sun,Song Guo

IEEE Transactions on Parallel and Distributed Systems（2021）

引用 7|浏览33

暂无评分

摘要

The multi-interface networks are efficient infrastructures to deploy distributed Deep Learning (DL) tasks as the model gradients generated by each worker can be exchanged to others via different links in parallel. Although this decentralized parameter synchronization mechanism can reduce the time of gradient exchange, building a high-performance distributed DL architecture still requires the balance of communication efficiency and computational utilization, i.e., addressing the issues of traffic burst, data consistency, and programming convenience. To achieve this goal, we intend to asynchronously exchange gradient pieces without the central control in multi-interface networks. We propose the Piece-level Gradient Exchange and Multi-interface Collective Communication to handle parameter synchronization and traffic transmission, respectively. Specifically, we design the gradient sketch approach based on 8-bit uniform quantization to compress gradient tensors and introduce the colayerabstraction to better handle gradient partition, exchange and pipelining. Also, we provide general programming interfaces to capture the synchronization semantics and build the Gradient Exchange Index (GEI) data structures to make our approach online applicable. We implement our algorithms into a prototype system called Canary by using PyTorch-1.4.0. Experiments conducted in Alibaba Cloud demonstrate that Canary reduces 56.28 percent traffic on average and completes the training by up to 1.61x, 2.28x, and 2.84x faster than BML, Ako on PyTorch, and PS on TensorFlow, respectively.

查看译文

关键词

Distributed systems,multi-interface network,deep learning,gradient sketch,decentralized architecture

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要