Robust Learning Rate Selection for Stochastic Optimization via Splitting Diagnostic
arxiv(2019)
摘要
This paper proposes SplitSGD, a new dynamic learning rate schedule for
stochastic optimization. This method decreases the learning rate for better
adaptation to the local geometry of the objective function whenever a
stationary phase is detected, that is, the iterates are likely to bounce at
around a vicinity of a local minimum. The detection is performed by splitting
the single thread into two and using the inner product of the gradients from
the two threads as a measure of stationarity. Owing to this simple yet provably
valid stationarity detection, SplitSGD is easy-to-implement and essentially
does not incur additional computational cost than standard SGD. Through a
series of extensive experiments, we show that this method is appropriate for
both convex problems and training (non-convex) neural networks, with
performance compared favorably to other stochastic optimization methods.
Importantly, this method is observed to be very robust with a set of default
parameters for a wide range of problems and, moreover, can yield better
generalization performance than other adaptive gradient methods such as Adam.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要