## AI帮你理解科学

## AI 精读

AI抽取本论文的概要总结

微博一下：

# Regret Bounds without Lipschitz Continuity: Online Learning with Relative-Lipschitz Losses

NIPS 2020, (2020)

EI

关键词

摘要

In online convex optimization (OCO), Lipschitz continuity of the functions is commonly assumed in order to obtain sublinear regret. Moreover, many algorithms have only logarithmic regret when these functions are also strongly convex. Recently, researchers from convex optimization proposed the notions of "relative Lipschitz continuity" a...更多

代码：

数据：

简介

- In online convex optimization (OCO), at each of many rounds a player has to pick a point from a convex set while an adversary chooses a convex function that penalizes the player’s choice.
- Classical results show th√at if the cost functions are Lipschitz continuous, there are algorithms which suffer at most O( T ) regret in T rounds [Zinkevich, 2003].
- Lu [2019] extended the offline setting results by showing a O(1/T ) convergence rate when the objective function is both relative Lipschitz continuous and relative strongly convex.

重点内容

- In online convex optimization (OCO), at each of many rounds a player has to pick a point from a convex set while an adversary chooses a convex function that penalizes the player’s choice
- 5.2 Logarithmic Regret with Relative Strongly Convex Functions In Section 3.2 we showed that follow the regularized leader (FTRL) suffers at most logarithmic regret when the loss functions are Lipschitz continuous and strongly convex, both relative to the same fixed reference function
- In this paper we showed regret bounds for both FTRL and stabilized online mirror descent (OMD) in the relative setting proposed by Lu [2019]
- We gave logarithmic regret bounds for both algorithms when the functions are relatively strongly convex, analogous to the results known in the classical setting
- The first would be to investigate the connections among the different notions of relative smoothness, Lipschitz continuity, and strong convexity in the literature

结果

- √ In the following theorem the authors formally state the sublinear O( T ) regret bound of FTRL in T rounds in the setting where the cost functions are Lipschitz continuous relative to the regularizer function used in the FTRL method.
- The authors do so by combining the optimality conditions from the definition of the iterates in Algorithm 1 with the L-Lipschitz continuity relative to R of the loss functions.
- Hazan et al [2007] showed that if the cost functions are Lipschitz continuous but strongly convex as well, the follow the leader (FTL) method—FTRL without any regularizer—attains logarithmic regret.
- The best dependence on the Lipschitz constant and “distance to the comparison point” is usually achieved when the loss functions are Lipschitz continuous and the FTRL regularizer is strongly convex, both with respect to the same norm.
- The authors give a regret bound for DS-OMD when the cost functions are all Lipschitz continuous relative to the mirror map√Φ.
- If the authors set each ft to be a fixed function f and take average of all iterates, the authors get the following convergence rate for classical convex optimization as a corollary.
- The authors show that OMD suffers at most logarithmic regret if the authors have Lipschitz continuity and strong convexity, both relative to the mirror map Φ.
- In this paper the authors showed regret bounds for both FTRL and stabilized OMD in the relative setting proposed by Lu [2019].
- The authors gave logarithmic regret bounds for both algorithms when the functions are relatively strongly convex, analogous to the results known in the classical setting.

结论

- The latter was already an interesting questions before notions of relative Lipschitz continuity and strong convexity were proposed, but these new ideas give more flexibility in the choice of a regularizer.
- In this paper the authors study the performance of online convex optimization algorithms when the functions are not necessarily Lipschitz continuous, a requirement in classical regret bounds.
- It opens up the range of applications, but sheds light onto the fundamental conditions on the cost functions and regularizers/mirror maps needed for OCO algorithms to have good guarantees.

总结

- In online convex optimization (OCO), at each of many rounds a player has to pick a point from a convex set while an adversary chooses a convex function that penalizes the player’s choice.
- Classical results show th√at if the cost functions are Lipschitz continuous, there are algorithms which suffer at most O( T ) regret in T rounds [Zinkevich, 2003].
- Lu [2019] extended the offline setting results by showing a O(1/T ) convergence rate when the objective function is both relative Lipschitz continuous and relative strongly convex.
- √ In the following theorem the authors formally state the sublinear O( T ) regret bound of FTRL in T rounds in the setting where the cost functions are Lipschitz continuous relative to the regularizer function used in the FTRL method.
- The authors do so by combining the optimality conditions from the definition of the iterates in Algorithm 1 with the L-Lipschitz continuity relative to R of the loss functions.
- Hazan et al [2007] showed that if the cost functions are Lipschitz continuous but strongly convex as well, the follow the leader (FTL) method—FTRL without any regularizer—attains logarithmic regret.
- The best dependence on the Lipschitz constant and “distance to the comparison point” is usually achieved when the loss functions are Lipschitz continuous and the FTRL regularizer is strongly convex, both with respect to the same norm.
- The authors give a regret bound for DS-OMD when the cost functions are all Lipschitz continuous relative to the mirror map√Φ.
- If the authors set each ft to be a fixed function f and take average of all iterates, the authors get the following convergence rate for classical convex optimization as a corollary.
- The authors show that OMD suffers at most logarithmic regret if the authors have Lipschitz continuity and strong convexity, both relative to the mirror map Φ.
- In this paper the authors showed regret bounds for both FTRL and stabilized OMD in the relative setting proposed by Lu [2019].
- The authors gave logarithmic regret bounds for both algorithms when the functions are relatively strongly convex, analogous to the results known in the classical setting.
- The latter was already an interesting questions before notions of relative Lipschitz continuity and strong convexity were proposed, but these new ideas give more flexibility in the choice of a regularizer.
- In this paper the authors study the performance of online convex optimization algorithms when the functions are not necessarily Lipschitz continuous, a requirement in classical regret bounds.
- It opens up the range of applications, but sheds light onto the fundamental conditions on the cost functions and regularizers/mirror maps needed for OCO algorithms to have good guarantees.

相关工作

- Analyses of gradient descent methods in the differentiable convex setting usually require the objective function f to be Lipschitz smooth, that is, the gradient of the objective function f is Lipschitz continuous. Bauschke et al [2017] proposed a generalized Lipschitz smoothness condition, called relative Lipschitz smoothness, using Bregman divergences of a fixed reference function. They proposed a proximal mirror descent method2 called NoLips with a O(1/T ) convergence rate for such functions. Van Nguyen [2017] independently developed similar ideas for analyzing the convergence of a Bregman proximal gradient method applied to convex composite functions in Banach spaces. Bolte et al [2018] extended the framework of Bauschke et al [2017] to the non-convex setting. Building upon this work, Lu et al [2018] slightly relaxed the definition of relative smoothness and gave simpler analyses for mirror descent and dual averaging. Hanzely and Richtárik [2018] propose and analyse coordinate and stochastic gradient descent methods for relatively smooth functions. These ideas were later applied to non-convex problems by Mukkamala and Ochs [2019]. More recently, Gao et al [2020] analysed the coordinate descent method with composite Lipschitz smooth objectives. Unlike those prior works, in this paper we focus on the online case with nondifferentiable loss functions.

基金

- This research was partially supported by the Canda CIFAR AI Chair Program and a NSERC Discovery Grant

引用论文

- K. Antonakopoulos, E. V. Belmega, and P. Mertikopoulos. Online and stochastic optimization beyond lipschitz continuity: A riemannian approach. In 8th International Conference on Learning Representations, ICLR, 2020.
- H. H. Bauschke, J. Bolte, and M. Teboulle. A descent lemma beyond lipschitz gradient continuity: first-order methods revisited and applications. Mathematics of Operations Research, 42(2):330– 348, 2017.
- A. Beck and M. Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31(3):167–175, 2003.
- A. Ben-Tal and A. Nemirovski. Lectures on modern convex optimization. MPS/SIAM Series on Optimization. Society for Industrial and Applied Mathematics (SIAM), 2001.
- J. Bolte, S. Sabach, M. Teboulle, and Y. Vaisbourd. First order methods beyond convexity and lipschitz gradient continuity with applications to quadratic inverse problems. SIAM Journal on Optimization, 28(3):2131–2151, 2018.
- S. Bubeck. Convex optimization: Algorithms and complexity. Foundations and Trends® in Machine Learning, 8(3-4):231–357, 2015.
- J. C. Duchi, S. Shalev-Shwartz, Y. Singer, and A. Tewari. Composite objective mirror descent. In COLT 2010, pages 14–26.
- Omnipress, 2010.
- J. C. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res., 12:2121–2159, 2011.
- H. Fang, N. J. A. Harvey, V. S. Portella, and M. P. Friedlander. Online mirror descent and dual averaging: keeping pace in the dynamic case. 2020. URL https://arxiv.org/abs/2006.02585.
- T. Gao, S. Lu, J. Liu, and C. Chu. Randomized bregman coordinate descent methods for nonlipschitz optimization. arXiv preprint arXiv:2001.05202, 2020.
- B. Grimmer. Convergence rates for deterministic and stochastic subgradient methods without lipschitz continuity. SIAM Journal on Optimization, 29(2):1350–1365, 2019.
- F. Hanzely and P. Richtárik. Fastest rates for stochastic mirror descent methods. 2018. URL http://arxiv.org/abs/1803.07374.
- E. Hazan. Introduction to online convex optimization. Foundations and Trends® in Optimization, 2 (3-4):157–325, 2016. URL http://ocobook.cs.princeton.edu/OCObook.pdf.
- E. Hazan, A. Agarwal, and S. Kale. Logarithmic regret algorithms for online convex optimization. Machine Learning, 69(2-3):169–192, 2007.
- H. Lu. “Relative continuity” for non-lipschitz nonsmooth convex optimization using stochastic (or deterministic) mirror descent. Informs Journal on Optimization, pages 265–352, 2019.
- H. Lu, R. M. Freund, and Y. Nesterov. Relatively smooth convex optimization by first-order methods, and applications. SIAM Journal on Optimization, 28(1):333–354, 2018.
- C. J. Maddison, D. Paulin, Y. W. Teh, B. O’Donoghue, and A. Doucet. Hamiltonian descent methods. arXiv preprint arXiv:1809.05042, 2018.
- H. B. McMahan. A survey of algorithms and analysis for adaptive online learning. The Journal of Machine Learning Research, 18(1):3117–3166, 2017.
- M. C. Mukkamala and P. Ochs. Beyond alternating updates for matrix factorization with inertial bregman proximal gradient algorithms. In Advances in Neural Information Processing Systems, pages 4268–4278, 2019.
- A. S. Nemirovsky and D. B. Yudin. Problem complexity and method efficiency in optimization. 1983.
- Y. Nesterov. Primal-dual subgradient methods for convex problems. Mathematical programming, 120(1):221–259, 2009.
- F. Orabona and D. Pál. Scale-free online learning. Theoretical Computer Science, 716:50–69, 2018. R. T. Rockafellar. Convex analysis. Princeton Landmarks in Mathematics. Princeton University
- Press, Princeton, NJ, 1997. ISBN 0-691-01586-4. Reprint of the 1970 original, Princeton Paperbacks. S. Shalev-Shwartz. Online learning and online convex optimization. Foundations and Trends® in Machine Learning, 4(2):107–194, 2011. Q. Van Nguyen. Forward-backward splitting with bregman distances. Vietnam Journal of Mathematics, 45(3):519–539, 2017. L. Xiao. Dual averaging methods for regularized stochastic learning and online optimization. Journal of Machine Learning Research, 11(Oct):2543–2596, 2010. M. Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Machine Learning, Proceedings of the Twentieth International Conference (ICML 2003), pages 928– 936, 2003.
- Proposition A.2 ([Antonakopoulos et al., 2020, Proposition 1]). Suppose that f: X → R is differentiable. Then f is L-RLC if and only if grad f (x) x ≤ L for all x ∈ X, (A.1)

标签

评论