Semantics-Preserving Parallelization of Stochastic Gradient Descent

2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS)(2018)

引用 6|浏览79
暂无评分
摘要
Stochastic gradient descent (SGD) is a well-known method for regression and classification tasks. However, it is an inherently sequential algorithm - at each step, the processing of the current example depends on the parameters learned from previous examples. Prior approaches to parallelizing linear learners using SGD, such as Hogwild! and AllReduce, do not honor these dependencies across threads and thus can potentially suffer poor convergence rates and/or poor scalability. This paper proposes SymSGD, a parallel SGD algorithm that, to a first-order approximation, retains the sequential semantics of SGD. Each thread learns a local model in addition to a model combiner, which allows local models to be combined to produce the same result as what a sequential SGD would have produced. This paper evaluates SymSGD's accuracy and performance on 6 datasets on a shared-memory machine shows up-to 11x speedup over our heavily optimized sequential baseline on 16 cores and 2.2x, on average, faster than Hogwild!.
更多
查看译文
关键词
Stochastic Gradient Descent,Machine Learning,Parallel Algorithms
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要