A Double Residual Compression Algorithm for Efficient Distributed Learning
AISTATS, pp. 133-143, 2019.
Large-scale machine learning models are often trained by parallel stochastic gradient descent algorithms. However, the communication cost of gradient aggregation and model synchronization between the master and worker nodes becomes the major obstacle for efficient learning as the number of workers and the dimension of the model increase...More
PPT (Upload PPT)