## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# Global Convergence and Variance Reduction for a Class of Nonconvex-Nonconcave Minimax Problems

NIPS 2020, (2020)

EI

Keywords

Abstract

Nonconvex minimax problems appear frequently in emerging machine learning applications, such as generative adversarial networks and adversarial learning. Simple algorithms such as the gradient descent ascent (GDA) are the common practice for solving these nonconvex games and receive lots of empirical success. Yet, it is known that these v...More

Code:

Data:

Introduction

- The authors consider minimax optimization problems of the forms min max f (x, y) (1)

x∈Rd1 y∈Rd2 where f (x, y) is a possibly nonconvex-nonconcave function. - The most frequently used methods for solving minimax problems are the gradient descent ascent (GDA) algorithms, with either simultaneous or alternating updates of the primal-dual variables, referred to as SGDA and AGDA, respectively
- While these algorithms have received much empirical success especially in adversarial training, it is known that GDA algorithms with constant stepsizes could fail to converge even for the bilinear games [22, 40]; when they do converge, the stable limit point may not be a local Nash equilibrium [13, 38].

Highlights

- We consider minimax optimization problems of the forms min max f (x, y) (1)

x∈Rd1 y∈Rd2 where f (x, y) is a possibly nonconvex-nonconcave function - We prove that VR-alternating gradient descent ascent (AGDA) achieves the complexity of O (n + n2/3κ3) log(1/ ), which improves over the O nκ3 log 1 complexity of AGDA and the O κ5/ complexity of Stoc-AGDA when applied to finite-sum minimax problems
- Our numerical experiments further demonstrate that variance-reduced AGDA algorithm (VR-AGDA) performs significantly better than AGDA and Stoc-AGDA, especially for problems with large condition numbers
- We identify a subclass of nonconvex-nonconcave minimax problems, represented by the so-called two-side PL condition, for which AGDA and Stoc-AGDA can converge to global saddle points
- We propose the first linearly-convergent variance-reduced AGDA algorithm that is provably faster than AGDA, for this subclass of minimax problems
- We hope this work can shed some light on the understanding of nonconvex-nonconcave minimax optimization: (1) different learning rates for two players are essential in gradient descent ascent (GDA) algorithms with alternating updates; (2) convexity-concavity is not a watershed to guarantee global convergence of GDA algorithms

Results

- The authors compare four algorithms: AGDA, Stoc-AGDA, VR-AGDA and extragradient (EG) with fine-tuned stepsizes.
- The authors observe that VR-AGDA and AGDA both exhibit linear convergence, and the speedup of VR-AGDA is fairly significant when the condition number is large, whereas Stoc-AGDA progresses fast at the beginning and stagnates later on.
- These numerical results clearly validate the theoretical findings.

Conclusion

- The authors identify a subclass of nonconvex-nonconcave minimax problems, represented by the so-called two-side PL condition, for which AGDA and Stoc-AGDA can converge to global saddle points.
- The authors propose the first linearly-convergent variance-reduced AGDA algorithm that is provably faster than AGDA, for this subclass of minimax problems.
- The authors hope this work can shed some light on the understanding of nonconvex-nonconcave minimax optimization: (1) different learning rates for two players are essential in GDA algorithms with alternating updates; (2) convexity-concavity is not a watershed to guarantee global convergence of GDA algorithms

Summary

## Introduction:

The authors consider minimax optimization problems of the forms min max f (x, y) (1)

x∈Rd1 y∈Rd2 where f (x, y) is a possibly nonconvex-nonconcave function.- The most frequently used methods for solving minimax problems are the gradient descent ascent (GDA) algorithms, with either simultaneous or alternating updates of the primal-dual variables, referred to as SGDA and AGDA, respectively
- While these algorithms have received much empirical success especially in adversarial training, it is known that GDA algorithms with constant stepsizes could fail to converge even for the bilinear games [22, 40]; when they do converge, the stable limit point may not be a local Nash equilibrium [13, 38].
## Results:

The authors compare four algorithms: AGDA, Stoc-AGDA, VR-AGDA and extragradient (EG) with fine-tuned stepsizes.- The authors observe that VR-AGDA and AGDA both exhibit linear convergence, and the speedup of VR-AGDA is fairly significant when the condition number is large, whereas Stoc-AGDA progresses fast at the beginning and stagnates later on.
- These numerical results clearly validate the theoretical findings.
## Conclusion:

The authors identify a subclass of nonconvex-nonconcave minimax problems, represented by the so-called two-side PL condition, for which AGDA and Stoc-AGDA can converge to global saddle points.- The authors propose the first linearly-convergent variance-reduced AGDA algorithm that is provably faster than AGDA, for this subclass of minimax problems.
- The authors hope this work can shed some light on the understanding of nonconvex-nonconcave minimax optimization: (1) different learning rates for two players are essential in GDA algorithms with alternating updates; (2) convexity-concavity is not a watershed to guarantee global convergence of GDA algorithms

Related work

- Nonconvex minimax problems. There has been a recent surge in research on solving minimax optimization beyond the convex-concave regime [54, 8, 51, 56, 30, 47, 1, 32, 3, 48], but they differ from our work from various perspectives. Most of these work focus on the nonconvex-concave regime and aim for convergence to stationary points of minimax problems [8, 54, 31, 56]. Algorithms in these work require solving the inner maximization or some sub-problems with high accuracy, which are different from AGDA. Lin et al [30] proposed an inexact proximal point method to find an stationary point for a class of weakly-convex-weakly-concave minimax problems. Their convergence result relies on assuming the existence of a solution to the corresponding Minty variational inequality, which is hard to verify. Abernethy et al [1] showed the linear convergence of a second-order iterative algorithm, called Hamiltonian gradient descent, for a subclass of "sufficiently bilinear" functions. Very recently, Xu et al [60] and Bot and Böhm [4] anslyze AGDA in nonconvex-(strongly-)concave setting. There is also a line of work in understanding the dynamics in minimax games [39, 20, 19, 21, 12, 25].

Funding

- Acknowledgments and Disclosure of Funding This work was supported in part by ONR grant W911NF-15-1-0479, NSF CCF-1704970, and NSF CMMI-1761699

Study subjects and analysis

datasets: 3

Datasets. We use three datasets in the experiments, and two of them are generated in the same way as in Du and Hu [15]. We generate the first dataset with n = 1000 and m = 500 by sampling rows of A from a Gaussian N (0, In) distribution and setting y0 = Ax∗ + with x∗ from Gaussian N (0, 1) and from Gaussian N (0, 0.01)

datasets represent cases with low: 3

The third dataset is generated with A ∈ R1000×500 from Gaussian N (0, Σ) where Σi,j = 2−|i−j|/10, M being rank-deficit with positive eigenvalues sampled from [0.2, 1.8] and λ = 1.5. These three datasets represent cases with low, median, and high condition numbers, respectively. Evaluation

Reference

- J. Abernethy, K. A. Lai, and A. Wibisono. Last-iterate convergence rates for min-max optimization. arXiv preprint arXiv:1906.02027, 2019.
- J. P. Bailey, G. Gidel, and G. Piliouras. Finite regret and cycles with fixed step-size via alternating gradient descent-ascent. arXiv preprint arXiv:1907.04392, 2019.
- B. Barazandeh and M. Razaviyayn. Solving non-convex non-differentiable min-max games using proximal gradient method. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3162–3166. IEEE, 2020.
- R. I. Bot and A. Böhm. Alternating proximal-gradient steps for (stochastic) nonconvex-concave minimax problems. arXiv preprint arXiv:2007.13605, 2020.
- Q. Cai, M. Hong, Y. Chen, and Z. Wang. On the global convergence of imitation learning: A case for linear quadratic regulator. arXiv preprint arXiv:1901.03674, 2019.
- M. Cassotti, D. Ballabio, V. Consonni, A. Mauri, I. V. Tetko, and R. Todeschini. Prediction of acute aquatic toxicity toward daphnia magna by using the ga-k nn method. Alternatives to Laboratory Animals, 42(1):31–41, 2014.
- T. Chavdarova, G. Gidel, F. Fleuret, and S. Lacoste-Julien. Reducing noise in gan training with variance reduced extragradient. In Advances in Neural Information Processing Systems, pages 391–401, 2019.
- R. S. Chen, B. Lucier, Y. Singer, and V. Syrgkanis. Robust optimization for non-convex objectives. In Advances in Neural Information Processing Systems, pages 4705–4714, 2017.
- Y. Chen and M. Wang. Stochastic primal-dual methods and sample complexity of reinforcement learning. arXiv preprint arXiv:1612.02516, 2016.
- B. Dai, N. He, Y. Pan, B. Boots, and L. Song. Learning from conditional distributions via dual embeddings. In Artificial Intelligence and Statistics, pages 1458–1467, 2017.
- B. Dai, A. Shaw, L. Li, L. Xiao, N. He, Z. Liu, J. Chen, and L. Song. SBEED: Convergent reinforcement learning with nonlinear function approximation. In Proceedings of the 35th International Conference on Machine Learning, pages 1125–1134, 2018.
- C. Daskalakis and I. Panageas. The limit points of (optimistic) gradient descent in min-max optimization. In Advances in Neural Information Processing Systems, pages 9236–9246, 2018.
- C. Daskalakis, A. Ilyas, V. Syrgkanis, and H. Zeng. Training gans with optimism. In International Conference on Learning Representations, 2018.
- S. Du, J. Lee, H. Li, L. Wang, and X. Zhai. Gradient descent finds global minima of deep neural networks. In International Conference on Machine Learning, pages 1675–1685, 2019.
- S. S. Du and W. Hu. Linear convergence of the primal-dual gradient method for convex-concave saddle point problems without strong convexity. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 196–205, 2019.
- L. El Ghaoui and H. Lebret. Robust solutions to least-squares problems with uncertain data. SIAM Journal on matrix analysis and applications, 18(4):1035–1064, 1997.
- F. Facchinei and J.-S. Pang. Finite-dimensional variational inequalities and complementarity problems. Springer Science & Business Media, 2007.
- M. Fazel, R. Ge, S. Kakade, and M. Mesbahi. Global convergence of policy gradient methods for the linear quadratic regulator. In International Conference on Machine Learning, pages 1467–1476, 2018.
- T. Fiez and L. Ratliff. Gradient descent-ascent provably converges to strict local minmax equilibria with a finite timescale separation. arXiv preprint arXiv:2009.14820, 2020.
- T. Fiez, B. Chasnov, and L. J. Ratliff. Convergence of learning dynamics in stackelberg games. arXiv preprint arXiv:1906.01217, 2019.
- T. Fiez, B. Chasnov, and L. Ratliff. Implicit learning dynamics in stackelberg games: Equilibria characterization, convergence analysis, and empirical study. In International Conference on Machine Learning (ICML), 2020.
- G. Gidel, R. A. Hemmat, M. Pezeshki, R. Le Priol, G. Huang, S. Lacoste-Julien, and I. Mitliagkas. Negative momentum for improved game dynamics. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 1802–1811, 2019.
- I. Goodfellow, Y. Bengio, and A. Courville. Deep learning. MIT press, 2016.
- Z. Guo, Z. Yuan, Y. Yan, and T. Yang. Fast objective and duality gap convergence for non-convex strongly-concave min-max problems. arXiv preprint arXiv:2006.06889, 2020.
- Y.-P. Hsieh, P. Mertikopoulos, and V. Cevher. The limits of min-max optimization algorithms: convergence to spurious non-critical sets. arXiv preprint arXiv:2006.09065, 2020.
- C. Jin, P. Netrapalli, and M. I. Jordan. What is local optimality in nonconvex-nonconcave minimax optimization? arXiv preprint arXiv:1902.00618, 2019.
- R. Johnson and T. Zhang. Accelerating stochastic gradient descent using predictive variance reduction. In Advances in neural information processing systems, pages 315–323, 2013.
- H. Karimi, J. Nutini, and M. Schmidt. Linear convergence of gradient and proximal-gradient methods under the polyak-łojasiewicz condition. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 795–811.
- G. Korpelevich. The extragradient method for finding saddle points and other problems. Matecon, 12:747–756, 1976.
- Q. Lin, M. Liu, H. Rafique, and T. Yang. Solving weakly-convex-weakly-concave saddlepoint problems as successive strongly monotone variational inequalities. arXiv preprint arXiv:1810.10207, 2018.
- T. Lin, C. Jin, and M. I. Jordan. On gradient descent ascent for nonconvex-concave minimax problems. arXiv preprint arXiv:1906.00331, 2019.
- T. Lin, C. Jin, M. Jordan, et al. Near-optimal algorithms for minimax optimization. arXiv preprint arXiv:2002.02417, 2020.
- M.-Y. Liu and O. Tuzel. Coupled generative adversarial networks. In Advances in neural information processing systems, pages 469–477, 2016.
- L. Luo, C. Chen, Y. Li, G. Xie, and Z. Zhang. A stochastic proximal point algorithm for saddle-point problems. arXiv preprint arXiv:1909.06946, 2019.
- L. Luo, H. Ye, and T. Zhang. Stochastic recursive gradient descent ascent for stochastic nonconvex-strongly-concave minimax problems. arXiv preprint arXiv:2001.03724, 2020.
- Z.-Q. Luo and P. Tseng. Error bounds and convergence analysis of feasible descent methods: a general approach. Annals of Operations Research, 46(1):157–178, 1993.
- A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018.
- E. Mazumdar and L. J. Ratliff. On the convergence of gradient-based learning in continuous games. ArXiv e-prints, 2018.
- E. Mazumdar, L. J. Ratliff, and S. S. Sastry. On gradient-based learning in continuous games. SIAM Journal on Mathematics of Data Science, 2(1):103–131, 2020.
- L. Mescheder, A. Geiger, and S. Nowozin. Which training methods for gans do actually converge? In International Conference on Machine Learning, pages 3481–3490, 2018.
- L. Metz, B. Poole, D. Pfau, and J. Sohl-Dickstein. Unrolled generative adversarial networks. arXiv preprint arXiv:1611.02163, 2016.
- H. Namkoong and J. C. Duchi. Stochastic gradient methods for distributionally robust optimization with f-divergences. In Advances in Neural Information Processing Systems, pages 2208–2216, 2016.
- H. Namkoong and J. C. Duchi. Variance-based regularization with convex objectives. In Advances in Neural Information Processing Systems, pages 2971–2980, 2017.
- J. Nash. Two-person cooperative games. Econometrica: Journal of the Econometric Society, pages 128–140, 1953.
- I. Necoara, Y. Nesterov, and F. Glineur. Linear convergence of first order methods for nonstrongly convex optimization. Mathematical Programming, pages 1–39, 2018.
- Y. Nesterov and L. Scrimali. Solving strongly monotone variational and quasi-variational inequalities. Available at SSRN 970903, 2006.
- M. Nouiehed, M. Sanjabi, T. Huang, J. D. Lee, and M. Razaviyayn. Solving a class of nonconvex min-max games using iterative first order methods. In Advances in Neural Information Processing Systems, pages 14905–14916, 2019.
- D. M. Ostrovskii, A. Lowy, and M. Razaviyayn. Efficient search of first-order nash equilibria in nonconvex-concave smooth min-max problems. arXiv preprint arXiv:2002.07919, 2020.
- B. Palaniappan and F. Bach. Stochastic variance reduction methods for saddle-point problems. In Advances in Neural Information Processing Systems, pages 1416–1424, 2016.
- B. T. Polyak. Gradient methods for minimizing functionals. Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki, 3(4):643–653, 1963.
- Q. Qian, S. Zhu, J. Tang, R. Jin, B. Sun, and H. Li. Robust optimization over multiple domains. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 4739–4746, 2019.
- S. J. Reddi, A. Hefny, S. Sra, B. Poczos, and A. Smola. Stochastic variance reduction for nonconvex optimization. In International Conference on Machine Learning, pages 314–323, 2016.
- S. J. Reddi, S. Sra, B. Póczos, and A. Smola. Fast incremental method for smooth nonconvex optimization. In 2016 IEEE 55th Conference on Decision and Control (CDC), pages 1971–1977. IEEE, 2016.
- A. Sinha, H. Namkoong, and J. Duchi. Certifying some distributional robustness with principled adversarial training. In International Conference on Learning Representations, 2018.
- J. Sun, Q. Qu, and J. Wright. A geometric analysis of phase retrieval. Foundations of Computational Mathematics, 18(5):1131–1198, 2018.
- K. K. Thekumparampil, P. Jain, P. Netrapalli, and S. Oh. Efficient algorithms for smooth minimax optimization. In Advances in Neural Information Processing Systems, pages 12659– 12670, 2019.
- J. Von Neumann, O. Morgenstern, and H. W. Kuhn. Theory of games and economic behavior (commemorative edition). Princeton university press, 2007.
- L. Xiao and T. Zhang. A proximal stochastic gradient method with progressive variance reduction. SIAM Journal on Optimization, 24(4):2057–2075, 2014.
- T. Xu, Z. Wang, Y. Liang, and H. V. Poor. Enhanced first and zeroth order variance reduced algorithms for min-max optimization. arXiv preprint arXiv:2006.09361, 2020.
- Z. Xu, H. Zhang, Y. Xu, and G. Lan. A unified single-loop alternating gradient projection algorithm for nonconvex-concave and convex-nonconcave minimax problems. arXiv preprint arXiv:2006.02032, 2020.
- H. Zhang and W. Yin. Gradient methods for convex minimization: better rates under weaker conditions. arXiv preprint arXiv:1303.4645, 2013.
- J. Zhang, M. Hong, and S. Zhang. On lower iteration complexity bounds for the saddle point problems. arXiv preprint arXiv:1912.07481, 2019.
- K. Zhang, Z. Yang, and T. Basar. Policy optimization provably converges to nash equilibria in zero-sum linear quadratic games. In Advances in Neural Information Processing Systems, pages 11602–11614, 2019.
- Y. Zhou, H. Zhang, and Y. Liang. Geometrical properties and accelerated gradient solvers of non-convex phase retrieval. In 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 331–335. IEEE, 2016.

Tags

Comments

数据免责声明

页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果，我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问，可以通过电子邮件方式联系我们：report@aminer.cn