Revisiting the Last-Iterate Convergence of Stochastic Gradient Methods
CoRR(2023)
摘要
In the past several years, the convergence of the last iterate of the
Stochastic Gradient Descent (SGD) algorithm has triggered people's interest due
to its good performance in practice but lack of theoretical understanding. For
Lipschitz and convex functions, different works have established the optimal
$O(\log(1/\delta)\log T/\sqrt{T})$ or $O(\sqrt{\log(1/\delta)/T})$
high-probability convergence rates for the final iterate, where $T$ is the time
horizon and $\delta$ is the failure probability. However, to prove these
bounds, all the existing works are limited to compact domains or require almost
surely bounded noises. It is natural to ask whether the last iterate of SGD can
still guarantee the optimal convergence rate but without these two restrictive
assumptions. Besides this important question, there are still lots of
theoretical problems lacking an answer. For example, compared with the last
iterate convergence of SGD for non-smooth problems, only few results for smooth
optimization have yet been developed. Additionally, the existing results are
all limited to a non-composite objective and the standard Euclidean norm. It
still remains unclear whether the last-iterate convergence can be provably
extended to wider composite optimization and non-Euclidean norms. In this work,
to address the issues mentioned above, we revisit the last-iterate convergence
of stochastic gradient methods and provide the first unified way to prove the
convergence rates both in expectation and in high probability to accommodate
general domains, composite objectives, non-Euclidean norms, Lipschitz
conditions, smoothness and (strong) convexity simultaneously. Additionally, we
extend our analysis to obtain the last-iterate convergence under heavy-tailed
noises.
更多查看译文
关键词
Convex Optimization,Stochastic Optimization,Last Iterate
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要