Training Deep Nets with Sublinear Memory Cost
arXiv: Learning, Volume abs/1604.06174, 2016.
We propose a systematic approach to reduce the memory consumption of deep neural network training. Specifically, we design an algorithm that costs O( √ n) memory to train a n layer network, with only the computational cost of an extra forward pass per mini-batch. As many of the state-of-the-art models hit the upper bound of the GPU memory...More
PPT (Upload PPT)