Optimal Finite-Sum Smooth Non-Convex Optimization with SARAH.

arXiv: Optimization and Control(2019)

引用 25|浏览91
暂无评分
摘要
The total complexity (measured as the total number of gradient computations) of a stochastic first-order optimization algorithm that finds a first-order stationary point of a finite-sum smooth nonconvex objective function $F(w)=frac{1}{n} sum_{i=1}^n f_i(w)$ has been proven to be at least $Omega(sqrt{n}/epsilon)$ where $epsilon$ denotes the attained accuracy $mathbb{E}[ |nabla F(tilde{w})|^2] leq epsilon$ for the outputted approximation $tilde{w}$ (Fang et al.,2018). This paper is the first to show that this lower bound is tight for the class of variance reduction methods which only assume the Lipschitz continuous gradient assumption. We prove this complexity result for a slightly modified version of the SARAH algorithm in (Nguyen et al.,2017a;b) - showing that SARAH is optimal and dominates all existing results. For convex optimization, we propose SARAH++ with sublinear convergence for general convex and linear convergence for strongly convex problems; and we provide a practical version for which numerical experiments on various datasets show an improved performance.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要