## AI帮你理解科学

## AI 精读

AI抽取本论文的概要总结

微博一下：

# A Catalyst Framework for Minimax Optimization

NIPS 2020, (2020)

EI

关键词

摘要

We introduce a generic two-loop scheme for smooth minimax optimization with strongly-convex-concave objectives. Our approach applies the accelerated proximal point framework (or Catalyst) to the associated dual problem and takes full advantage of existing gradient-based algorithms to solve a sequence of well-balanced strongly-convex-stron...更多

代码：

数据：

简介

- Minimax optimization has been extensively studied in past decades in the communities of mathematics, economics, and operations research.
- It is unclear how these sophisticated algorithms can be integrated with variance-reduction techniques to solve strongly-convex-concave minimax problems with finite-sum structure efficiently.
- Most existing variance-reduced algorithms in minimax optimization focus on strongly-convexstrongly-concave setting, e.g., SVRG and SAGA [35], SPD1-VR [40], SVRE [5], Point-SAGA [23], primal-dual SVRG [10], etc.

重点内容

- Minimax optimization has been extensively studied in past decades in the communities of mathematics, economics, and operations research
- Recent years have witnessed a surge of its applications in machine learning, including generative adversarial networks [14], adversarial training [44, 25], distributionally robust optimization [28, 1], reinforcement learning [7, 8], and many others
- To the best of our knowledge, the design of efficient variance reduction methods for finite-sum structured minimax problems under the strongly-convex-concave or nonconvex-concave settings remains largely unexplored. This raises the question: can we leverage the rich off-the-shelf methods designed for stronglyconvex-strongly-concave minimax problems to these unexplored settings of interest? Inspired by the success of the Catalyst framework that uses gradient-based algorithms originally designed for strongly convex minimization problems to minimize convex/nonconvex objectives [19, 36], we introduce a generic Catalyst framework for minimax optimization
- (ii) For nonconvex-concave minimax optimization, we provide a simple two-time-scale inexact proximal point algorithm for finding an -stationary point that matches the state-of-the-art complexity of O 2 −3
- We discuss the optimal choice of the regularization parameter τ for different settings
- When extended to the nonconvex-concave minimax optimization, our algorithm again achieves the state-of-the-art complexity for finding a stationary point
- A key observation is that by setting τ = μ, the auxiliary problem ( ) becomes (μ, μ)-SC-SC, and it is known that simple extragradient method (EG) or optimistic gradient descent ascent (OGDA) achieves the optimal complexity for solving this class of well-balanced SC-SC problems [47]

结果

- Inspired by the success of the Catalyst framework that uses gradient-based algorithms originally designed for strongly convex minimization problems to minimize convex/nonconvex objectives [19, 36], the authors introduce a generic Catalyst framework for minimax optimization.
- Rooted in an inexact accelerated proximal point framework, the idea is to repeatedly solve the following auxiliary strongly-convex-stronglyconcave problems using an existing method M: minx∈X
- The MINIMAX-APPA algorithm [21] uses τx = 1 and τy = O( ), which results in extra complications in solving the auxiliary problems.
- Based on the generic Catalyst framework, the authors establish a number of interesting results: (i) For strongly-convex-concave minimax optimization, the authors develop a family of two-loop algorithms with near-optimal complexity and reduced order of the logarithmic ply combing Catalyst with extragradient method yields the complexity, O
- The authors focus on solving strongly-convex-concave minimax problems and introduce a general Catalyst scheme.
- The subroutines used to solve the auxiliary minimax problems and choices of regularization parameters in these work are quite distinct from ours.
- Suppose Assumptions 1, 2 hold, and the subproblems are solved by a linearly convergent algorithm M to satisfy the stopping criterion (3) or (6) with accuracy (t) as specified in Theorem 3.1.
- A key observation is that by setting τ = μ, the auxiliary problem ( ) becomes (μ, μ)-SC-SC, and it is known that simple EG or OGDA achieves the optimal complexity for solving this class of well-balanced SC-SC problems [47].
- Subproblem ( ) has the finite-sum structure and can be solved by a number of linearly-convergent variance-reduced algorithms, such as SVRG, SAGA [35], and SVRE [5].

结论

- If the authors choose τx = 2 , τy = and use EG/OGDA/GDA to solve subproblems, Algorithm 2 finds an -stationary point with the total number of gradient evaluations of O 2 −3 .
- 2, the authors compare four algorithms: extragradient (EG), SVRG, Catalyst-EG, Catalyst-SVRG with besttuned stepsizes, and evaluate their errors based on (a) distance to the limit point: pt−p∗ + σt−σ∗ ; (b) norm of gradient mapping: ∇pf) + σt − PΣ(σt + β∇σf) /β.
- EG with average iterates has an optimal complexity of O(1/ ) for solving convex-concave minimax problems [29], its convergence behavior for SC-C minimax optimization remains unknown.

总结

- Minimax optimization has been extensively studied in past decades in the communities of mathematics, economics, and operations research.
- It is unclear how these sophisticated algorithms can be integrated with variance-reduction techniques to solve strongly-convex-concave minimax problems with finite-sum structure efficiently.
- Most existing variance-reduced algorithms in minimax optimization focus on strongly-convexstrongly-concave setting, e.g., SVRG and SAGA [35], SPD1-VR [40], SVRE [5], Point-SAGA [23], primal-dual SVRG [10], etc.
- Inspired by the success of the Catalyst framework that uses gradient-based algorithms originally designed for strongly convex minimization problems to minimize convex/nonconvex objectives [19, 36], the authors introduce a generic Catalyst framework for minimax optimization.
- Rooted in an inexact accelerated proximal point framework, the idea is to repeatedly solve the following auxiliary strongly-convex-stronglyconcave problems using an existing method M: minx∈X
- The MINIMAX-APPA algorithm [21] uses τx = 1 and τy = O( ), which results in extra complications in solving the auxiliary problems.
- Based on the generic Catalyst framework, the authors establish a number of interesting results: (i) For strongly-convex-concave minimax optimization, the authors develop a family of two-loop algorithms with near-optimal complexity and reduced order of the logarithmic ply combing Catalyst with extragradient method yields the complexity, O
- The authors focus on solving strongly-convex-concave minimax problems and introduce a general Catalyst scheme.
- The subroutines used to solve the auxiliary minimax problems and choices of regularization parameters in these work are quite distinct from ours.
- Suppose Assumptions 1, 2 hold, and the subproblems are solved by a linearly convergent algorithm M to satisfy the stopping criterion (3) or (6) with accuracy (t) as specified in Theorem 3.1.
- A key observation is that by setting τ = μ, the auxiliary problem ( ) becomes (μ, μ)-SC-SC, and it is known that simple EG or OGDA achieves the optimal complexity for solving this class of well-balanced SC-SC problems [47].
- Subproblem ( ) has the finite-sum structure and can be solved by a number of linearly-convergent variance-reduced algorithms, such as SVRG, SAGA [35], and SVRE [5].
- If the authors choose τx = 2 , τy = and use EG/OGDA/GDA to solve subproblems, Algorithm 2 finds an -stationary point with the total number of gradient evaluations of O 2 −3 .
- 2, the authors compare four algorithms: extragradient (EG), SVRG, Catalyst-EG, Catalyst-SVRG with besttuned stepsizes, and evaluate their errors based on (a) distance to the limit point: pt−p∗ + σt−σ∗ ; (b) norm of gradient mapping: ∇pf) + σt − PΣ(σt + β∇σf) /β.
- EG with average iterates has an optimal complexity of O(1/ ) for solving convex-concave minimax problems [29], its convergence behavior for SC-C minimax optimization remains unknown.

- Table1: Comparison with other algorithms for general strongly-convex-concave setting. ity, we ignore the dependency on and μ inside log. The lower complexity bound is Ω
- Table2: The table summarizes the optimal choice of regularization parameter τ and total complexity of the proposed Catalyst framework for finite-sum SC-C minimax optimization with f (x, y) =

基金

- Acknowledgments and Disclosure of Funding This work was supported in part by ONR grant W911NF-15-1-0479, NSF CCF-1704970, and NSF CMMI-1761699

引用论文

- S. S. Abadeh, P. M. M. Esfahani, and D. Kuhn. Distributionally robust logistic regression. In Advances in Neural Information Processing Systems, pages 1576–1584, 2015.
- W. Azizian, I. Mitliagkas, S. Lacoste-Julien, and G. Gidel. A tight and unified analysis of extragradient for a whole spectrum of differentiable games. arXiv preprint arXiv:1906.05945, 2019.
- S. Boyd, S. P. Boyd, and L. Vandenberghe. Convex optimization. Cambridge university press, 2004.
- A. Chambolle and T. Pock. On the ergodic convergence rates of a first-order primal–dual algorithm. Mathematical Programming, 159(1-2):253–287, 2016.
- T. Chavdarova, G. Gidel, F. Fleuret, and S. Lacoste-Julien. Reducing noise in gan training with variance reduced extragradient. In Advances in Neural Information Processing Systems, pages 391–401, 2019.
- Y. Chen, G. Lan, and Y. Ouyang. Optimal primal-dual methods for a class of saddle point problems. SIAM Journal on Optimization, 24(4):1779–1814, 2014.
- B. Dai, N. He, Y. Pan, B. Boots, and L. Song. Learning from conditional distributions via dual embeddings. In Artificial Intelligence and Statistics, pages 1458–1467, 2017.
- B. Dai, A. Shaw, L. Li, L. Xiao, N. He, Z. Liu, J. Chen, and L. Song. SBEED: Convergent reinforcement learning with nonlinear function approximation. In International Conference on Machine Learning, pages 1125–1134, 2018.
- D. Davis and D. Drusvyatskiy. Stochastic subgradient method converges at the rate o(k−1/4) on weakly convex functions. arXiv preprint arXiv:1802.02988, 2018.
- S. S. Du and W. Hu. Linear convergence of the primal-dual gradient method for convex-concave saddle point problems without strong convexity. arXiv preprint arXiv:1802.01504, 2018.
- F. Facchinei and J.-S. Pang. Finite-dimensional variational inequalities and complementarity problems. Springer Science & Business Media, 2007.
- A. Garnaev and W. Trappe. An eavesdropping game with sinr as an objective function. In International Conference on Security and Privacy in Communication Systems, pages 142–162.
- G. Gidel, H. Berard, G. Vignoud, P. Vincent, and S. Lacoste-Julien. A variational inequality perspective on generative adversarial networks. arXiv preprint arXiv:1802.10551, 2018.
- I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.
- N. He, A. Juditsky, and A. Nemirovski. Mirror prox algorithm for multi-term composite minimization and semi-separable problems. Computational Optimization and Applications, 61 (2):275–319, 2015.
- M. Kang, M. Kang, and M. Jung. Inexact accelerated augmented lagrangian methods. Computational Optimization and Applications, 62(2):373–404, 2015.
- W. Kong and R. D. Monteiro. An accelerated inexact proximal point method for solving nonconvex-concave min-max problems. arXiv preprint arXiv:1905.13433, 2019.
- G. Korpelevich. The extragradient method for finding saddle points and other problems. Matecon, 12:747–756, 1976.
- H. Lin, J. Mairal, and Z. Harchaoui. Catalyst acceleration for first-order convex optimization: from theory to practice. The Journal of Machine Learning Research, 18(1):7854–7907, 2017.
- Q. Lin, M. Liu, H. Rafique, and T. Yang. Solving weakly-convex-weakly-concave saddlepoint problems as successive strongly monotone variational inequalities. arXiv preprint arXiv:1810.10207, 2018.
- T. Lin, C. Jin, and M. Jordan. Near-optimal algorithms for minimax optimization. arXiv preprint arXiv:2002.02417, 2020.
- S. Lu, I. Tsaknakis, M. Hong, and Y. Chen. Hybrid block successive approximation for onesided non-convex min-max problems: algorithms and applications. IEEE Transactions on Signal Processing, 2020.
- L. Luo, C. Chen, Y. Li, G. Xie, and Z. Zhang. A stochastic proximal point algorithm for saddle-point problems. arXiv preprint arXiv:1909.06946, 2019.
- L. Luo, H. Ye, and T. Zhang. Stochastic recursive gradient descent ascent for stochastic nonconvex-strongly-concave minimax problems. arXiv preprint arXiv:2001.03724, 2020.
- A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
- A. Mokhtari, A. Ozdaglar, and S. Pattathil. A unified analysis of extra-gradient and optimistic gradient methods for saddle point problems: Proximal point approach. arXiv preprint arXiv:1901.08511, 2019.
- R. D. Monteiro and B. F. Svaiter. On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean. SIAM Journal on Optimization, 20(6):2755–2787, 2010.
- H. Namkoong and J. C. Duchi. Variance-based regularization with convex objectives. In Advances in Neural Information Processing Systems, pages 2971–2980, 2017.
- A. Nemirovski. Prox-method with rate of convergence o (1/t) for variational inequalities with lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM Journal on Optimization, 15(1):229–251, 2004.
- A. Nemirovsky and D. Yudin. Problem complexity and method efficiency in optimization. 1983.
- Y. Nesterov. Dual extrapolation and its applications to solving variational inequalities and related problems. Mathematical Programming, 109(2-3):319–344, 2007.
- Y. Nesterov. Introductory lectures on convex optimization: A basic course, volume 87. Springer Science & Business Media, 2013.
- D. M. Ostrovskii, A. Lowy, and M. Razaviyayn. Efficient search of first-order nash equilibria in nonconvex-concave smooth min-max problems. arXiv preprint arXiv:2002.07919, 2020.
- Y. Ouyang and Y. Xu. Lower complexity bounds of first-order methods for convex-concave bilinear saddle-point problems. Mathematical Programming, pages 1–35, 2019.
- B. Palaniappan and F. Bach. Stochastic variance reduction methods for saddle-point problems. In Advances in Neural Information Processing Systems, pages 1416–1424, 2016.
- C. Paquette, H. Lin, D. Drusvyatskiy, J. Mairal, and Z. Harchaoui. Catalyst acceleration for gradient-based non-convex optimization. arXiv preprint arXiv:1703.10993, 2017.
- H. Rafique, M. Liu, Q. Lin, and T. Yang. Non-convex min-max optimization: Provable algorithms and applications in machine learning. arXiv preprint arXiv:1810.02060, 2018.
- R. T. Rockafellar. Monotone operators and the proximal point algorithm. SIAM journal on control and optimization, 14(5):877–898, 1976.
- M. Sibony. Méthodes itératives pour les équations et iné equations aux dérivées partielles non linéaires de type monotone. Calcolo, 7(1-2):65–183, 1970.
- C. Tan, T. Zhang, S. Ma, and J. Liu. Stochastic primal-dual method for empirical risk minimization with o(1) per-iteration complexity. In Advances in Neural Information Processing Systems, pages 8366–8375, 2018.
- K. K. Thekumparampil, P. Jain, P. Netrapalli, and S. Oh. Efficient algorithms for smooth minimax optimization. In Advances in Neural Information Processing Systems, pages 12659– 12670, 2019.
- P. Tseng. On linear convergence of iterative methods for the variational inequality problem. Journal of Computational and Applied Mathematics, 60(1-2):237–252, 1995.
- P. Tseng. On accelerated proximal gradient methods for convex-concave optimization. submitted to SIAM Journal on Optimization, 1, 2008.
- J. Wang, T. Zhang, S. Liu, P.-Y. Chen, J. Xu, M. Fardad, and B. Li. Towards a unified min-max framework for adversarial exploration and robustness.
- Z. Xie and J. Shi. Accelerated primal dual method for a class of saddle point problem with strongly convex component. arXiv preprint arXiv:1906.07691, 2019.
- J. Yang, N. Kiyavash, and N. He. Global convergence and variance-reduced optimization for a class of nonconvex-nonconcave minimax problems. arXiv preprint arXiv:2002.09621, 2020.
- J. Zhang, M. Hong, and S. Zhang. On lower iteration complexity bounds for the saddle point problems. arXiv preprint arXiv:1912.07481, 2019.
- R. Zhao. A primal dual smoothing framework for max-structured nonconvex optimization. arXiv preprint arXiv:2003.04375, 2020.

标签

评论