Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
arXiv: Computer Vision and Pattern Recognition, Volume abs/1706.02677, 2017.
Deep learning thrives with large neural networks and large datasets. However, larger networks and larger datasets result in longer training times that impede research and development progress. Distributed synchronous SGD offers a potential solution to this problem by dividing SGD minibatches over a pool of parallel workers. Yet to make th...More
Full Text (Upload PDF)
PPT (Upload PPT)