AdaPipe: Optimizing Pipeline Parallelism with Adaptive Recomputation and Partitioning

Zhenbo Sun,Huanqi Cao,Yuanwei Wang,Guanyu Feng,Shengqi Chen,Haojie Wang,Wenguang Chen

ASPLOS '24 Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3（2024）

引用 0|浏览49

暂无评分

摘要

Large language models (LLMs) have demonstrated powerful capabilities, requiring huge memory with their increasing sizes and sequence lengths, thus demanding larger parallel systems. The broadly adopted pipeline parallelism introduces even heavier and unbalanced memory consumption. Recomputation is a widely employed technique to mitigate the problem but introduces extra computation overhead. This paper proposes AdaPipe, which aims to find the optimized recomputation and pipeline stage partitioning strategy. AdaPipe employs adaptive recomputation to maximize memory utilization and reduce the computation cost of each pipeline stage. A flexible stage partitioning algorithm is also adopted to balance the computation between different stages. We evaluate AdaPipe by training two representative models, GPT-3 (175B) and Llama 2 (70B), achieving up to 1.32× and 1.22× speedup on clusters with NVIDIA GPUs and Ascend NPUs respectively.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要