On the Last-Iterate Convergence of Shuffling Gradient Methods
arxiv(2024)
摘要
Shuffling gradient methods, which are also known as stochastic gradient
descent (SGD) without replacement, are widely implemented in practice,
particularly including three popular algorithms: Random Reshuffle (RR), Shuffle
Once (SO), and Incremental Gradient (IG). Compared to the empirical success,
the theoretical guarantee of shuffling gradient methods was not
well-understanding for a long time. Until recently, the convergence rates had
just been established for the average iterate for convex functions and the last
iterate for strongly convex problems (using squared distance as the metric).
However, when using the function value gap as the convergence criterion,
existing theories cannot interpret the good performance of the last iterate in
different settings (e.g., constrained optimization). To bridge this gap between
practice and theory, we prove last-iterate convergence rates for shuffling
gradient methods with respect to the objective value even without strong
convexity. Our new results either (nearly) match the existing last-iterate
lower bounds or are as fast as the previous best upper bounds for the average
iterate.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要