AutoPipe: A Fast Pipeline Parallelism Approach with Balanced Partitioning and Micro-batch Slicing

Weijie Liu,Zhiquan Lai,Shengwei Li,Yabo Duan,Keshi Ge,Dongsheng Li

2022 IEEE International Conference on Cluster Computing (CLUSTER)（2022）

引用 2|浏览103

暂无评分

摘要

Recently, pipeline parallelism has been widely used in training large DNN models. However, there are still two main challenges for efficient pipeline parallelism: i) a balanced model partition is crucial for pipeline efficiency, whereas prior works lack a sound solution to generate a balanced partition automatically. ii) the startup overhead is inevitable and especially significant for deep pipelines, which is an essential source of pipeline bubbles and severely affects pipeline scalability. We propose AutoPipe to solve these two problems, which contains i) a planner for automatically and quickly generating a balanced pipeline partition scheme with a fine-grained partitioner. This partitioner groups DNN in the sub-layer granularity and finds the balanced scheme with a heuristic search algorithm; and ii) a micro-batch slicer that reduces pipeline startup overhead according to the planner results by splitting the micro-batch evenly. This slicer automatically solves an appropriate number of micro-batches to split. The experimental results show that AutoPipe can accelerate training by up to 1.30x over the state-of-the-art distributed training framework Megatron-LM, with a 50% reduction in startup overhead and an order-of-magnitude reduction in pipeline planning time. Furthermore, AutoPipe Planner improves the partition balance by 2.73x-12.7x compared to DAPPLE Planner and Piper.

查看译文

关键词

artificial neural networks,distributed system

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要