Faster Derivative-Free Stochastic Algorithm for Shared Memory Machines.

ICML(2018)

引用 24|浏览43
暂无评分
摘要
Asynchronous parallel stochastic gradient optimization has been playing a pivotal role to solve large-scale machine learning problems in big data applications. Zeroth-order (derivative-free) methods estimate the gradient only by two function evaluations, thus have been applied to solve the problems where the explicit gradient calculations are computationally expensive or infeasible. Recently, the first asynchronous parallel stochastic zeroth-order algorithm (AsySZO) was proposed. However, its convergence rate is O(1/SQRT{T}) for the smooth, possibly non-convex learning problems, which is significantly slower than O(1/T) the best convergence rate of (asynchronous) stochastic gradient algorithm. To fill this gap, in this paper, we first point out the fundamental reason leading to the slow convergence rate of AsySZO, and then propose a new asynchronous stochastic zerothorder algorithm (AsySZO+). We provide a faster convergence rate O(1/bT) (b is the mini-batch size) for AsySZO+ by the rigorous theoretical analysis, which is a significant improvement over O(1/SQRT{T}). The experimental results on the application of ensemble learning confirm that our AsySZO+ has a faster convergence rate than the existing (asynchronous) stochastic zeroth-order algorithms.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要