AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
We introduce a generic two-loop scheme for smooth minimax optimization with strongly-convex-concave objectives

A Catalyst Framework for Minimax Optimization

NIPS 2020, (2020)

被引用0|浏览18
EI
下载 PDF 全文
引用
微博一下

摘要

We introduce a generic two-loop scheme for smooth minimax optimization with strongly-convex-concave objectives. Our approach applies the accelerated proximal point framework (or Catalyst) to the associated dual problem and takes full advantage of existing gradient-based algorithms to solve a sequence of well-balanced strongly-convex-stron...更多

代码

数据

0
简介
  • Minimax optimization has been extensively studied in past decades in the communities of mathematics, economics, and operations research.
  • It is unclear how these sophisticated algorithms can be integrated with variance-reduction techniques to solve strongly-convex-concave minimax problems with finite-sum structure efficiently.
  • Most existing variance-reduced algorithms in minimax optimization focus on strongly-convexstrongly-concave setting, e.g., SVRG and SAGA [35], SPD1-VR [40], SVRE [5], Point-SAGA [23], primal-dual SVRG [10], etc.
重点内容
  • Minimax optimization has been extensively studied in past decades in the communities of mathematics, economics, and operations research
  • Recent years have witnessed a surge of its applications in machine learning, including generative adversarial networks [14], adversarial training [44, 25], distributionally robust optimization [28, 1], reinforcement learning [7, 8], and many others
  • To the best of our knowledge, the design of efficient variance reduction methods for finite-sum structured minimax problems under the strongly-convex-concave or nonconvex-concave settings remains largely unexplored. This raises the question: can we leverage the rich off-the-shelf methods designed for stronglyconvex-strongly-concave minimax problems to these unexplored settings of interest? Inspired by the success of the Catalyst framework that uses gradient-based algorithms originally designed for strongly convex minimization problems to minimize convex/nonconvex objectives [19, 36], we introduce a generic Catalyst framework for minimax optimization
  • (ii) For nonconvex-concave minimax optimization, we provide a simple two-time-scale inexact proximal point algorithm for finding an -stationary point that matches the state-of-the-art complexity of O 2 −3
  • We discuss the optimal choice of the regularization parameter τ for different settings
  • When extended to the nonconvex-concave minimax optimization, our algorithm again achieves the state-of-the-art complexity for finding a stationary point
  • A key observation is that by setting τ = μ, the auxiliary problem ( ) becomes (μ, μ)-SC-SC, and it is known that simple extragradient method (EG) or optimistic gradient descent ascent (OGDA) achieves the optimal complexity for solving this class of well-balanced SC-SC problems [47]
结果
  • Inspired by the success of the Catalyst framework that uses gradient-based algorithms originally designed for strongly convex minimization problems to minimize convex/nonconvex objectives [19, 36], the authors introduce a generic Catalyst framework for minimax optimization.
  • Rooted in an inexact accelerated proximal point framework, the idea is to repeatedly solve the following auxiliary strongly-convex-stronglyconcave problems using an existing method M: minx∈X
  • The MINIMAX-APPA algorithm [21] uses τx = 1 and τy = O( ), which results in extra complications in solving the auxiliary problems.
  • Based on the generic Catalyst framework, the authors establish a number of interesting results: (i) For strongly-convex-concave minimax optimization, the authors develop a family of two-loop algorithms with near-optimal complexity and reduced order of the logarithmic ply combing Catalyst with extragradient method yields the complexity, O
  • The authors focus on solving strongly-convex-concave minimax problems and introduce a general Catalyst scheme.
  • The subroutines used to solve the auxiliary minimax problems and choices of regularization parameters in these work are quite distinct from ours.
  • Suppose Assumptions 1, 2 hold, and the subproblems are solved by a linearly convergent algorithm M to satisfy the stopping criterion (3) or (6) with accuracy (t) as specified in Theorem 3.1.
  • A key observation is that by setting τ = μ, the auxiliary problem ( ) becomes (μ, μ)-SC-SC, and it is known that simple EG or OGDA achieves the optimal complexity for solving this class of well-balanced SC-SC problems [47].
  • Subproblem ( ) has the finite-sum structure and can be solved by a number of linearly-convergent variance-reduced algorithms, such as SVRG, SAGA [35], and SVRE [5].
结论
  • If the authors choose τx = 2 , τy = and use EG/OGDA/GDA to solve subproblems, Algorithm 2 finds an -stationary point with the total number of gradient evaluations of O 2 −3 .
  • 2, the authors compare four algorithms: extragradient (EG), SVRG, Catalyst-EG, Catalyst-SVRG with besttuned stepsizes, and evaluate their errors based on (a) distance to the limit point: pt−p∗ + σt−σ∗ ; (b) norm of gradient mapping: ∇pf) + σt − PΣ(σt + β∇σf) /β.
  • EG with average iterates has an optimal complexity of O(1/ ) for solving convex-concave minimax problems [29], its convergence behavior for SC-C minimax optimization remains unknown.
总结
  • Minimax optimization has been extensively studied in past decades in the communities of mathematics, economics, and operations research.
  • It is unclear how these sophisticated algorithms can be integrated with variance-reduction techniques to solve strongly-convex-concave minimax problems with finite-sum structure efficiently.
  • Most existing variance-reduced algorithms in minimax optimization focus on strongly-convexstrongly-concave setting, e.g., SVRG and SAGA [35], SPD1-VR [40], SVRE [5], Point-SAGA [23], primal-dual SVRG [10], etc.
  • Inspired by the success of the Catalyst framework that uses gradient-based algorithms originally designed for strongly convex minimization problems to minimize convex/nonconvex objectives [19, 36], the authors introduce a generic Catalyst framework for minimax optimization.
  • Rooted in an inexact accelerated proximal point framework, the idea is to repeatedly solve the following auxiliary strongly-convex-stronglyconcave problems using an existing method M: minx∈X
  • The MINIMAX-APPA algorithm [21] uses τx = 1 and τy = O( ), which results in extra complications in solving the auxiliary problems.
  • Based on the generic Catalyst framework, the authors establish a number of interesting results: (i) For strongly-convex-concave minimax optimization, the authors develop a family of two-loop algorithms with near-optimal complexity and reduced order of the logarithmic ply combing Catalyst with extragradient method yields the complexity, O
  • The authors focus on solving strongly-convex-concave minimax problems and introduce a general Catalyst scheme.
  • The subroutines used to solve the auxiliary minimax problems and choices of regularization parameters in these work are quite distinct from ours.
  • Suppose Assumptions 1, 2 hold, and the subproblems are solved by a linearly convergent algorithm M to satisfy the stopping criterion (3) or (6) with accuracy (t) as specified in Theorem 3.1.
  • A key observation is that by setting τ = μ, the auxiliary problem ( ) becomes (μ, μ)-SC-SC, and it is known that simple EG or OGDA achieves the optimal complexity for solving this class of well-balanced SC-SC problems [47].
  • Subproblem ( ) has the finite-sum structure and can be solved by a number of linearly-convergent variance-reduced algorithms, such as SVRG, SAGA [35], and SVRE [5].
  • If the authors choose τx = 2 , τy = and use EG/OGDA/GDA to solve subproblems, Algorithm 2 finds an -stationary point with the total number of gradient evaluations of O 2 −3 .
  • 2, the authors compare four algorithms: extragradient (EG), SVRG, Catalyst-EG, Catalyst-SVRG with besttuned stepsizes, and evaluate their errors based on (a) distance to the limit point: pt−p∗ + σt−σ∗ ; (b) norm of gradient mapping: ∇pf) + σt − PΣ(σt + β∇σf) /β.
  • EG with average iterates has an optimal complexity of O(1/ ) for solving convex-concave minimax problems [29], its convergence behavior for SC-C minimax optimization remains unknown.
表格
  • Table1: Comparison with other algorithms for general strongly-convex-concave setting. ity, we ignore the dependency on and μ inside log. The lower complexity bound is Ω
  • Table2: The table summarizes the optimal choice of regularization parameter τ and total complexity of the proposed Catalyst framework for finite-sum SC-C minimax optimization with f (x, y) =
Download tables as Excel
基金
  • Acknowledgments and Disclosure of Funding This work was supported in part by ONR grant W911NF-15-1-0479, NSF CCF-1704970, and NSF CMMI-1761699
引用论文
  • S. S. Abadeh, P. M. M. Esfahani, and D. Kuhn. Distributionally robust logistic regression. In Advances in Neural Information Processing Systems, pages 1576–1584, 2015.
    Google ScholarLocate open access versionFindings
  • W. Azizian, I. Mitliagkas, S. Lacoste-Julien, and G. Gidel. A tight and unified analysis of extragradient for a whole spectrum of differentiable games. arXiv preprint arXiv:1906.05945, 2019.
    Findings
  • S. Boyd, S. P. Boyd, and L. Vandenberghe. Convex optimization. Cambridge university press, 2004.
    Google ScholarFindings
  • A. Chambolle and T. Pock. On the ergodic convergence rates of a first-order primal–dual algorithm. Mathematical Programming, 159(1-2):253–287, 2016.
    Google ScholarLocate open access versionFindings
  • T. Chavdarova, G. Gidel, F. Fleuret, and S. Lacoste-Julien. Reducing noise in gan training with variance reduced extragradient. In Advances in Neural Information Processing Systems, pages 391–401, 2019.
    Google ScholarLocate open access versionFindings
  • Y. Chen, G. Lan, and Y. Ouyang. Optimal primal-dual methods for a class of saddle point problems. SIAM Journal on Optimization, 24(4):1779–1814, 2014.
    Google ScholarLocate open access versionFindings
  • B. Dai, N. He, Y. Pan, B. Boots, and L. Song. Learning from conditional distributions via dual embeddings. In Artificial Intelligence and Statistics, pages 1458–1467, 2017.
    Google ScholarLocate open access versionFindings
  • B. Dai, A. Shaw, L. Li, L. Xiao, N. He, Z. Liu, J. Chen, and L. Song. SBEED: Convergent reinforcement learning with nonlinear function approximation. In International Conference on Machine Learning, pages 1125–1134, 2018.
    Google ScholarLocate open access versionFindings
  • D. Davis and D. Drusvyatskiy. Stochastic subgradient method converges at the rate o(k−1/4) on weakly convex functions. arXiv preprint arXiv:1802.02988, 2018.
    Findings
  • S. S. Du and W. Hu. Linear convergence of the primal-dual gradient method for convex-concave saddle point problems without strong convexity. arXiv preprint arXiv:1802.01504, 2018.
    Findings
  • F. Facchinei and J.-S. Pang. Finite-dimensional variational inequalities and complementarity problems. Springer Science & Business Media, 2007.
    Google ScholarFindings
  • A. Garnaev and W. Trappe. An eavesdropping game with sinr as an objective function. In International Conference on Security and Privacy in Communication Systems, pages 142–162.
    Google ScholarLocate open access versionFindings
  • G. Gidel, H. Berard, G. Vignoud, P. Vincent, and S. Lacoste-Julien. A variational inequality perspective on generative adversarial networks. arXiv preprint arXiv:1802.10551, 2018.
    Findings
  • I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.
    Google ScholarLocate open access versionFindings
  • N. He, A. Juditsky, and A. Nemirovski. Mirror prox algorithm for multi-term composite minimization and semi-separable problems. Computational Optimization and Applications, 61 (2):275–319, 2015.
    Google ScholarLocate open access versionFindings
  • M. Kang, M. Kang, and M. Jung. Inexact accelerated augmented lagrangian methods. Computational Optimization and Applications, 62(2):373–404, 2015.
    Google ScholarLocate open access versionFindings
  • W. Kong and R. D. Monteiro. An accelerated inexact proximal point method for solving nonconvex-concave min-max problems. arXiv preprint arXiv:1905.13433, 2019.
    Findings
  • G. Korpelevich. The extragradient method for finding saddle points and other problems. Matecon, 12:747–756, 1976.
    Google ScholarLocate open access versionFindings
  • H. Lin, J. Mairal, and Z. Harchaoui. Catalyst acceleration for first-order convex optimization: from theory to practice. The Journal of Machine Learning Research, 18(1):7854–7907, 2017.
    Google ScholarLocate open access versionFindings
  • Q. Lin, M. Liu, H. Rafique, and T. Yang. Solving weakly-convex-weakly-concave saddlepoint problems as successive strongly monotone variational inequalities. arXiv preprint arXiv:1810.10207, 2018.
    Findings
  • T. Lin, C. Jin, and M. Jordan. Near-optimal algorithms for minimax optimization. arXiv preprint arXiv:2002.02417, 2020.
    Findings
  • S. Lu, I. Tsaknakis, M. Hong, and Y. Chen. Hybrid block successive approximation for onesided non-convex min-max problems: algorithms and applications. IEEE Transactions on Signal Processing, 2020.
    Google ScholarLocate open access versionFindings
  • L. Luo, C. Chen, Y. Li, G. Xie, and Z. Zhang. A stochastic proximal point algorithm for saddle-point problems. arXiv preprint arXiv:1909.06946, 2019.
    Findings
  • L. Luo, H. Ye, and T. Zhang. Stochastic recursive gradient descent ascent for stochastic nonconvex-strongly-concave minimax problems. arXiv preprint arXiv:2001.03724, 2020.
    Findings
  • A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
    Findings
  • A. Mokhtari, A. Ozdaglar, and S. Pattathil. A unified analysis of extra-gradient and optimistic gradient methods for saddle point problems: Proximal point approach. arXiv preprint arXiv:1901.08511, 2019.
    Findings
  • R. D. Monteiro and B. F. Svaiter. On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean. SIAM Journal on Optimization, 20(6):2755–2787, 2010.
    Google ScholarLocate open access versionFindings
  • H. Namkoong and J. C. Duchi. Variance-based regularization with convex objectives. In Advances in Neural Information Processing Systems, pages 2971–2980, 2017.
    Google ScholarLocate open access versionFindings
  • A. Nemirovski. Prox-method with rate of convergence o (1/t) for variational inequalities with lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM Journal on Optimization, 15(1):229–251, 2004.
    Google ScholarLocate open access versionFindings
  • A. Nemirovsky and D. Yudin. Problem complexity and method efficiency in optimization. 1983.
    Google ScholarFindings
  • Y. Nesterov. Dual extrapolation and its applications to solving variational inequalities and related problems. Mathematical Programming, 109(2-3):319–344, 2007.
    Google ScholarLocate open access versionFindings
  • Y. Nesterov. Introductory lectures on convex optimization: A basic course, volume 87. Springer Science & Business Media, 2013.
    Google ScholarFindings
  • D. M. Ostrovskii, A. Lowy, and M. Razaviyayn. Efficient search of first-order nash equilibria in nonconvex-concave smooth min-max problems. arXiv preprint arXiv:2002.07919, 2020.
    Findings
  • Y. Ouyang and Y. Xu. Lower complexity bounds of first-order methods for convex-concave bilinear saddle-point problems. Mathematical Programming, pages 1–35, 2019.
    Google ScholarLocate open access versionFindings
  • B. Palaniappan and F. Bach. Stochastic variance reduction methods for saddle-point problems. In Advances in Neural Information Processing Systems, pages 1416–1424, 2016.
    Google ScholarLocate open access versionFindings
  • C. Paquette, H. Lin, D. Drusvyatskiy, J. Mairal, and Z. Harchaoui. Catalyst acceleration for gradient-based non-convex optimization. arXiv preprint arXiv:1703.10993, 2017.
    Findings
  • H. Rafique, M. Liu, Q. Lin, and T. Yang. Non-convex min-max optimization: Provable algorithms and applications in machine learning. arXiv preprint arXiv:1810.02060, 2018.
    Findings
  • R. T. Rockafellar. Monotone operators and the proximal point algorithm. SIAM journal on control and optimization, 14(5):877–898, 1976.
    Google ScholarLocate open access versionFindings
  • M. Sibony. Méthodes itératives pour les équations et iné equations aux dérivées partielles non linéaires de type monotone. Calcolo, 7(1-2):65–183, 1970.
    Google ScholarLocate open access versionFindings
  • C. Tan, T. Zhang, S. Ma, and J. Liu. Stochastic primal-dual method for empirical risk minimization with o(1) per-iteration complexity. In Advances in Neural Information Processing Systems, pages 8366–8375, 2018.
    Google ScholarLocate open access versionFindings
  • K. K. Thekumparampil, P. Jain, P. Netrapalli, and S. Oh. Efficient algorithms for smooth minimax optimization. In Advances in Neural Information Processing Systems, pages 12659– 12670, 2019.
    Google ScholarLocate open access versionFindings
  • P. Tseng. On linear convergence of iterative methods for the variational inequality problem. Journal of Computational and Applied Mathematics, 60(1-2):237–252, 1995.
    Google ScholarLocate open access versionFindings
  • P. Tseng. On accelerated proximal gradient methods for convex-concave optimization. submitted to SIAM Journal on Optimization, 1, 2008.
    Google ScholarFindings
  • J. Wang, T. Zhang, S. Liu, P.-Y. Chen, J. Xu, M. Fardad, and B. Li. Towards a unified min-max framework for adversarial exploration and robustness.
    Google ScholarFindings
  • Z. Xie and J. Shi. Accelerated primal dual method for a class of saddle point problem with strongly convex component. arXiv preprint arXiv:1906.07691, 2019.
    Findings
  • J. Yang, N. Kiyavash, and N. He. Global convergence and variance-reduced optimization for a class of nonconvex-nonconcave minimax problems. arXiv preprint arXiv:2002.09621, 2020.
    Findings
  • J. Zhang, M. Hong, and S. Zhang. On lower iteration complexity bounds for the saddle point problems. arXiv preprint arXiv:1912.07481, 2019.
    Findings
  • R. Zhao. A primal dual smoothing framework for max-structured nonconvex optimization. arXiv preprint arXiv:2003.04375, 2020.
    Findings
作者
Junchi Yang
Junchi Yang
Siqi Zhang
Siqi Zhang
您的评分 :
0

 

标签
评论
小科