Weighted Aggregating Stochastic Gradient Descent for Parallel Deep Learning
IEEE Transactions on Knowledge and Data Engineering(2022)
摘要
This paper investigates the stochastic optimization problem focusing on developing scalable parallel algorithms for deep learning tasks. Our solution involves a reformation of the objective function for stochastic optimization in neural network models, along with a novel parallel computing strategy, coined the weighted aggregating stochastic gradient descent (
WASGD
). Following a theoretical analysis on the characteristics of the new objective function,
WASGD
introduces a decentralized weighted aggregating scheme based on the performance of local workers. Without any center variable, the new method automatically gauges the importance of local workers and accepts them by their contributions. Furthermore, we have developed an enhanced version of the method,
WASGD+
, by (1) implementing a designed sample order and (2) upgrading the weight evaluation function. To validate the new method, we benchmark our pipeline against several popular algorithms including the state-of-the-art deep neural network classifier training techniques (e.g., elastic averaging SGD). Comprehensive validation studies have been conducted on four classic datasets:
CIFAR-100
,
CIFAR-10
,
Fashion-MNIST
, and
MNIST
. Subsequent results have firmly validated the superiority of the
WASGD
scheme in accelerating the training of deep architecture. Better still, the enhanced version,
WASGD+
, is shown to be a significant improvement over its prototype.
更多查看译文
关键词
Stochastic optimization,stochastic gradient descent,parallel computing,deep learning,neural network
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要