Capuchin: Tensor-based GPU Memory Management for Deep Learning

ASPLOS '20: Architectural Support for Programming Languages and Operating Systems Lausanne Switzerland March, 2020(2020)

引用 129|浏览869
暂无评分
摘要
In recent years, deep learning has gained unprecedented success in various domains, the key of the success is the larger and deeper deep neural networks (DNNs) that achieved very high accuracy. On the other side, since GPU global memory is a scarce resource, large models also pose a significant challenge due to memory requirement in the training process. This restriction limits the DNN architecture exploration flexibility. In this paper, we propose Capuchin, a tensor-based GPU memory management module that reduces the memory footprint via tensor eviction/prefetching and recomputation. The key feature of Capuchin is that it makes memory management decisions based on dynamic tensor access pattern tracked at runtime. This design is motivated by the observation that the access pattern to tensors is regular during training iterations. Based on the identified patterns, one can exploit the total memory optimization space and offer the fine-grain and flexible control of when and how to perform memory optimization techniques. We deploy Capuchin in a widely-used deep learning framework, Tensorflow, and show that Capuchin can reduce the memory footprint by up to 85% among 6 state-of-the-art DNNs compared to the original Tensorflow. Especially, for the NLP task BERT, the maximum batch size that Capuchin can outperforms Tensorflow and gradient-checkpointing by 7x and 2.1x, respectively. We also show that Capuchin outperforms vDNN and gradient-checkpointing by up to 286% and 55% under the same memory oversubscription.
更多
查看译文
关键词
deep learning training,GPU memory management,tensor access
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要