Revisiting Distributed Synchronous SGD
arXiv: Learning, Volume abs/1702.05800, 2017.
Distributed training of deep learning models on large-scale training data is typically conducted with asynchronous stochastic optimization to maximize the rate of updates, at the cost of additional noise introduced from asynchrony. In contrast, the synchronous approach is often thought to be impractical due to idle time wasted on waiting ...More
Full Text (Upload PDF)
PPT (Upload PPT)