AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
We gave logarithmic regret bounds for both algorithms when the functions are relatively strongly convex, analogous to the results known in the classical setting

Regret Bounds without Lipschitz Continuity: Online Learning with Relative-Lipschitz Losses

NIPS 2020, (2020)

被引用0|浏览14
EI
下载 PDF 全文
引用
微博一下

摘要

In online convex optimization (OCO), Lipschitz continuity of the functions is commonly assumed in order to obtain sublinear regret. Moreover, many algorithms have only logarithmic regret when these functions are also strongly convex. Recently, researchers from convex optimization proposed the notions of "relative Lipschitz continuity" a...更多

代码

数据

简介
  • In online convex optimization (OCO), at each of many rounds a player has to pick a point from a convex set while an adversary chooses a convex function that penalizes the player’s choice.
  • Classical results show th√at if the cost functions are Lipschitz continuous, there are algorithms which suffer at most O( T ) regret in T rounds [Zinkevich, 2003].
  • Lu [2019] extended the offline setting results by showing a O(1/T ) convergence rate when the objective function is both relative Lipschitz continuous and relative strongly convex.
重点内容
  • In online convex optimization (OCO), at each of many rounds a player has to pick a point from a convex set while an adversary chooses a convex function that penalizes the player’s choice
  • 5.2 Logarithmic Regret with Relative Strongly Convex Functions In Section 3.2 we showed that follow the regularized leader (FTRL) suffers at most logarithmic regret when the loss functions are Lipschitz continuous and strongly convex, both relative to the same fixed reference function
  • In this paper we showed regret bounds for both FTRL and stabilized online mirror descent (OMD) in the relative setting proposed by Lu [2019]
  • We gave logarithmic regret bounds for both algorithms when the functions are relatively strongly convex, analogous to the results known in the classical setting
  • The first would be to investigate the connections among the different notions of relative smoothness, Lipschitz continuity, and strong convexity in the literature
结果
  • √ In the following theorem the authors formally state the sublinear O( T ) regret bound of FTRL in T rounds in the setting where the cost functions are Lipschitz continuous relative to the regularizer function used in the FTRL method.
  • The authors do so by combining the optimality conditions from the definition of the iterates in Algorithm 1 with the L-Lipschitz continuity relative to R of the loss functions.
  • Hazan et al [2007] showed that if the cost functions are Lipschitz continuous but strongly convex as well, the follow the leader (FTL) method—FTRL without any regularizer—attains logarithmic regret.
  • The best dependence on the Lipschitz constant and “distance to the comparison point” is usually achieved when the loss functions are Lipschitz continuous and the FTRL regularizer is strongly convex, both with respect to the same norm.
  • The authors give a regret bound for DS-OMD when the cost functions are all Lipschitz continuous relative to the mirror map√Φ.
  • If the authors set each ft to be a fixed function f and take average of all iterates, the authors get the following convergence rate for classical convex optimization as a corollary.
  • The authors show that OMD suffers at most logarithmic regret if the authors have Lipschitz continuity and strong convexity, both relative to the mirror map Φ.
  • In this paper the authors showed regret bounds for both FTRL and stabilized OMD in the relative setting proposed by Lu [2019].
  • The authors gave logarithmic regret bounds for both algorithms when the functions are relatively strongly convex, analogous to the results known in the classical setting.
结论
  • The latter was already an interesting questions before notions of relative Lipschitz continuity and strong convexity were proposed, but these new ideas give more flexibility in the choice of a regularizer.
  • In this paper the authors study the performance of online convex optimization algorithms when the functions are not necessarily Lipschitz continuous, a requirement in classical regret bounds.
  • It opens up the range of applications, but sheds light onto the fundamental conditions on the cost functions and regularizers/mirror maps needed for OCO algorithms to have good guarantees.
总结
  • In online convex optimization (OCO), at each of many rounds a player has to pick a point from a convex set while an adversary chooses a convex function that penalizes the player’s choice.
  • Classical results show th√at if the cost functions are Lipschitz continuous, there are algorithms which suffer at most O( T ) regret in T rounds [Zinkevich, 2003].
  • Lu [2019] extended the offline setting results by showing a O(1/T ) convergence rate when the objective function is both relative Lipschitz continuous and relative strongly convex.
  • √ In the following theorem the authors formally state the sublinear O( T ) regret bound of FTRL in T rounds in the setting where the cost functions are Lipschitz continuous relative to the regularizer function used in the FTRL method.
  • The authors do so by combining the optimality conditions from the definition of the iterates in Algorithm 1 with the L-Lipschitz continuity relative to R of the loss functions.
  • Hazan et al [2007] showed that if the cost functions are Lipschitz continuous but strongly convex as well, the follow the leader (FTL) method—FTRL without any regularizer—attains logarithmic regret.
  • The best dependence on the Lipschitz constant and “distance to the comparison point” is usually achieved when the loss functions are Lipschitz continuous and the FTRL regularizer is strongly convex, both with respect to the same norm.
  • The authors give a regret bound for DS-OMD when the cost functions are all Lipschitz continuous relative to the mirror map√Φ.
  • If the authors set each ft to be a fixed function f and take average of all iterates, the authors get the following convergence rate for classical convex optimization as a corollary.
  • The authors show that OMD suffers at most logarithmic regret if the authors have Lipschitz continuity and strong convexity, both relative to the mirror map Φ.
  • In this paper the authors showed regret bounds for both FTRL and stabilized OMD in the relative setting proposed by Lu [2019].
  • The authors gave logarithmic regret bounds for both algorithms when the functions are relatively strongly convex, analogous to the results known in the classical setting.
  • The latter was already an interesting questions before notions of relative Lipschitz continuity and strong convexity were proposed, but these new ideas give more flexibility in the choice of a regularizer.
  • In this paper the authors study the performance of online convex optimization algorithms when the functions are not necessarily Lipschitz continuous, a requirement in classical regret bounds.
  • It opens up the range of applications, but sheds light onto the fundamental conditions on the cost functions and regularizers/mirror maps needed for OCO algorithms to have good guarantees.
相关工作
  • Analyses of gradient descent methods in the differentiable convex setting usually require the objective function f to be Lipschitz smooth, that is, the gradient of the objective function f is Lipschitz continuous. Bauschke et al [2017] proposed a generalized Lipschitz smoothness condition, called relative Lipschitz smoothness, using Bregman divergences of a fixed reference function. They proposed a proximal mirror descent method2 called NoLips with a O(1/T ) convergence rate for such functions. Van Nguyen [2017] independently developed similar ideas for analyzing the convergence of a Bregman proximal gradient method applied to convex composite functions in Banach spaces. Bolte et al [2018] extended the framework of Bauschke et al [2017] to the non-convex setting. Building upon this work, Lu et al [2018] slightly relaxed the definition of relative smoothness and gave simpler analyses for mirror descent and dual averaging. Hanzely and Richtárik [2018] propose and analyse coordinate and stochastic gradient descent methods for relatively smooth functions. These ideas were later applied to non-convex problems by Mukkamala and Ochs [2019]. More recently, Gao et al [2020] analysed the coordinate descent method with composite Lipschitz smooth objectives. Unlike those prior works, in this paper we focus on the online case with nondifferentiable loss functions.
基金
  • This research was partially supported by the Canda CIFAR AI Chair Program and a NSERC Discovery Grant
引用论文
  • K. Antonakopoulos, E. V. Belmega, and P. Mertikopoulos. Online and stochastic optimization beyond lipschitz continuity: A riemannian approach. In 8th International Conference on Learning Representations, ICLR, 2020.
    Google ScholarLocate open access versionFindings
  • H. H. Bauschke, J. Bolte, and M. Teboulle. A descent lemma beyond lipschitz gradient continuity: first-order methods revisited and applications. Mathematics of Operations Research, 42(2):330– 348, 2017.
    Google ScholarLocate open access versionFindings
  • A. Beck and M. Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31(3):167–175, 2003.
    Google ScholarLocate open access versionFindings
  • A. Ben-Tal and A. Nemirovski. Lectures on modern convex optimization. MPS/SIAM Series on Optimization. Society for Industrial and Applied Mathematics (SIAM), 2001.
    Google ScholarLocate open access versionFindings
  • J. Bolte, S. Sabach, M. Teboulle, and Y. Vaisbourd. First order methods beyond convexity and lipschitz gradient continuity with applications to quadratic inverse problems. SIAM Journal on Optimization, 28(3):2131–2151, 2018.
    Google ScholarLocate open access versionFindings
  • S. Bubeck. Convex optimization: Algorithms and complexity. Foundations and Trends® in Machine Learning, 8(3-4):231–357, 2015.
    Google ScholarLocate open access versionFindings
  • J. C. Duchi, S. Shalev-Shwartz, Y. Singer, and A. Tewari. Composite objective mirror descent. In COLT 2010, pages 14–26.
    Google ScholarLocate open access versionFindings
  • Omnipress, 2010.
    Google ScholarFindings
  • J. C. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res., 12:2121–2159, 2011.
    Google ScholarLocate open access versionFindings
  • H. Fang, N. J. A. Harvey, V. S. Portella, and M. P. Friedlander. Online mirror descent and dual averaging: keeping pace in the dynamic case. 2020. URL https://arxiv.org/abs/2006.02585.
    Findings
  • T. Gao, S. Lu, J. Liu, and C. Chu. Randomized bregman coordinate descent methods for nonlipschitz optimization. arXiv preprint arXiv:2001.05202, 2020.
    Findings
  • B. Grimmer. Convergence rates for deterministic and stochastic subgradient methods without lipschitz continuity. SIAM Journal on Optimization, 29(2):1350–1365, 2019.
    Google ScholarLocate open access versionFindings
  • F. Hanzely and P. Richtárik. Fastest rates for stochastic mirror descent methods. 2018. URL http://arxiv.org/abs/1803.07374.
    Findings
  • E. Hazan. Introduction to online convex optimization. Foundations and Trends® in Optimization, 2 (3-4):157–325, 2016. URL http://ocobook.cs.princeton.edu/OCObook.pdf.
    Findings
  • E. Hazan, A. Agarwal, and S. Kale. Logarithmic regret algorithms for online convex optimization. Machine Learning, 69(2-3):169–192, 2007.
    Google ScholarLocate open access versionFindings
  • H. Lu. “Relative continuity” for non-lipschitz nonsmooth convex optimization using stochastic (or deterministic) mirror descent. Informs Journal on Optimization, pages 265–352, 2019.
    Google ScholarLocate open access versionFindings
  • H. Lu, R. M. Freund, and Y. Nesterov. Relatively smooth convex optimization by first-order methods, and applications. SIAM Journal on Optimization, 28(1):333–354, 2018.
    Google ScholarLocate open access versionFindings
  • C. J. Maddison, D. Paulin, Y. W. Teh, B. O’Donoghue, and A. Doucet. Hamiltonian descent methods. arXiv preprint arXiv:1809.05042, 2018.
    Findings
  • H. B. McMahan. A survey of algorithms and analysis for adaptive online learning. The Journal of Machine Learning Research, 18(1):3117–3166, 2017.
    Google ScholarLocate open access versionFindings
  • M. C. Mukkamala and P. Ochs. Beyond alternating updates for matrix factorization with inertial bregman proximal gradient algorithms. In Advances in Neural Information Processing Systems, pages 4268–4278, 2019.
    Google ScholarLocate open access versionFindings
  • A. S. Nemirovsky and D. B. Yudin. Problem complexity and method efficiency in optimization. 1983.
    Google ScholarFindings
  • Y. Nesterov. Primal-dual subgradient methods for convex problems. Mathematical programming, 120(1):221–259, 2009.
    Google ScholarLocate open access versionFindings
  • F. Orabona and D. Pál. Scale-free online learning. Theoretical Computer Science, 716:50–69, 2018. R. T. Rockafellar. Convex analysis. Princeton Landmarks in Mathematics. Princeton University
    Google ScholarLocate open access versionFindings
  • Press, Princeton, NJ, 1997. ISBN 0-691-01586-4. Reprint of the 1970 original, Princeton Paperbacks. S. Shalev-Shwartz. Online learning and online convex optimization. Foundations and Trends® in Machine Learning, 4(2):107–194, 2011. Q. Van Nguyen. Forward-backward splitting with bregman distances. Vietnam Journal of Mathematics, 45(3):519–539, 2017. L. Xiao. Dual averaging methods for regularized stochastic learning and online optimization. Journal of Machine Learning Research, 11(Oct):2543–2596, 2010. M. Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Machine Learning, Proceedings of the Twentieth International Conference (ICML 2003), pages 928– 936, 2003.
    Google ScholarLocate open access versionFindings
  • Proposition A.2 ([Antonakopoulos et al., 2020, Proposition 1]). Suppose that f: X → R is differentiable. Then f is L-RLC if and only if grad f (x) x ≤ L for all x ∈ X, (A.1)
    Google ScholarFindings
作者
Yihan Zhou
Yihan Zhou
Victor Sanches Portella
Victor Sanches Portella
Mark Schmidt
Mark Schmidt
Nicholas Harvey
Nicholas Harvey
您的评分 :
0

 

标签
评论
小科