BaPipe: Balanced Pipeline Parallelism for DNN Training

PARALLEL PROCESSING LETTERS(2022)

引用 1|浏览17
暂无评分
摘要
The size of deep neural networks (DNNs) grows rapidly as the complexity of the machine learning algorithm increases. Distributed deep learning based on model parallelism has been widely used to satisfy the requirements of DNN training related to computation and memory. In this paper, we propose a training framework for pipeline parallelism called BaPipe (Balanced Pipeline) that can automatically explore methods to schedule pipeline parallelism and balanced partition strategies for DNN training on heterogeneous accelerator clusters. In BaPipe, each accelerator calculates the forward and backward propagation for the assigned partition of networks to implement an intra-batch pipeline parallelism strategy. By considering the parameters of DNN models as well as the computation, memory, and communication resources of each accelerator, BaPipe automatically selects the most suitable method of pipeline scheduling from among multiple proposed scheduling modes. It also uses a novel strategy to automatically investigate load balancing in the context of inter-layer partition, intra-layer partition, and coarse-grained partition. We trained such DNNs as VGG-16, ResNet-50, and Google's Neural Machine Translation (GNMT) on GPU clusters, and simulated the training-related performance of FPGA clusters. Compared with the state-of-the-art frameworks for data parallelism (DP) and pipeline parallelism, BaPipe provides a speedup of 3.2x and 4x of memory reduction on various homogeneous and heterogeneous platforms.
更多
查看译文
关键词
DNN training, pipeline parallelism, load balancing, parallel and distributed systems
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要