AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We show that for a subclass of nonconvex-nonconcave objectives satisfying a so-called two-sided Polyak-Łojasiewicz inequality, the alternating gradient descent ascent algorithm converges globally at a linear rate and the stochastic AGDA achieves a sublinear rate

Global Convergence and Variance Reduction for a Class of Nonconvex-Nonconcave Minimax Problems

NIPS 2020, (2020)

Cited by: 0|Views19
EI
Full Text
Bibtex
Weibo

Abstract

Nonconvex minimax problems appear frequently in emerging machine learning applications, such as generative adversarial networks and adversarial learning. Simple algorithms such as the gradient descent ascent (GDA) are the common practice for solving these nonconvex games and receive lots of empirical success. Yet, it is known that these v...More

Code:

Data:

0
Introduction
  • The authors consider minimax optimization problems of the forms min max f (x, y) (1)

    x∈Rd1 y∈Rd2 where f (x, y) is a possibly nonconvex-nonconcave function.
  • The most frequently used methods for solving minimax problems are the gradient descent ascent (GDA) algorithms, with either simultaneous or alternating updates of the primal-dual variables, referred to as SGDA and AGDA, respectively
  • While these algorithms have received much empirical success especially in adversarial training, it is known that GDA algorithms with constant stepsizes could fail to converge even for the bilinear games [22, 40]; when they do converge, the stable limit point may not be a local Nash equilibrium [13, 38].
Highlights
  • We consider minimax optimization problems of the forms min max f (x, y) (1)

    x∈Rd1 y∈Rd2 where f (x, y) is a possibly nonconvex-nonconcave function
  • We prove that VR-alternating gradient descent ascent (AGDA) achieves the complexity of O (n + n2/3κ3) log(1/ ), which improves over the O nκ3 log 1 complexity of AGDA and the O κ5/ complexity of Stoc-AGDA when applied to finite-sum minimax problems
  • Our numerical experiments further demonstrate that variance-reduced AGDA algorithm (VR-AGDA) performs significantly better than AGDA and Stoc-AGDA, especially for problems with large condition numbers
  • We identify a subclass of nonconvex-nonconcave minimax problems, represented by the so-called two-side PL condition, for which AGDA and Stoc-AGDA can converge to global saddle points
  • We propose the first linearly-convergent variance-reduced AGDA algorithm that is provably faster than AGDA, for this subclass of minimax problems
  • We hope this work can shed some light on the understanding of nonconvex-nonconcave minimax optimization: (1) different learning rates for two players are essential in gradient descent ascent (GDA) algorithms with alternating updates; (2) convexity-concavity is not a watershed to guarantee global convergence of GDA algorithms
Results
  • The authors compare four algorithms: AGDA, Stoc-AGDA, VR-AGDA and extragradient (EG) with fine-tuned stepsizes.
  • The authors observe that VR-AGDA and AGDA both exhibit linear convergence, and the speedup of VR-AGDA is fairly significant when the condition number is large, whereas Stoc-AGDA progresses fast at the beginning and stagnates later on.
  • These numerical results clearly validate the theoretical findings.
Conclusion
  • The authors identify a subclass of nonconvex-nonconcave minimax problems, represented by the so-called two-side PL condition, for which AGDA and Stoc-AGDA can converge to global saddle points.
  • The authors propose the first linearly-convergent variance-reduced AGDA algorithm that is provably faster than AGDA, for this subclass of minimax problems.
  • The authors hope this work can shed some light on the understanding of nonconvex-nonconcave minimax optimization: (1) different learning rates for two players are essential in GDA algorithms with alternating updates; (2) convexity-concavity is not a watershed to guarantee global convergence of GDA algorithms
Summary
  • Introduction:

    The authors consider minimax optimization problems of the forms min max f (x, y) (1)

    x∈Rd1 y∈Rd2 where f (x, y) is a possibly nonconvex-nonconcave function.
  • The most frequently used methods for solving minimax problems are the gradient descent ascent (GDA) algorithms, with either simultaneous or alternating updates of the primal-dual variables, referred to as SGDA and AGDA, respectively
  • While these algorithms have received much empirical success especially in adversarial training, it is known that GDA algorithms with constant stepsizes could fail to converge even for the bilinear games [22, 40]; when they do converge, the stable limit point may not be a local Nash equilibrium [13, 38].
  • Results:

    The authors compare four algorithms: AGDA, Stoc-AGDA, VR-AGDA and extragradient (EG) with fine-tuned stepsizes.
  • The authors observe that VR-AGDA and AGDA both exhibit linear convergence, and the speedup of VR-AGDA is fairly significant when the condition number is large, whereas Stoc-AGDA progresses fast at the beginning and stagnates later on.
  • These numerical results clearly validate the theoretical findings.
  • Conclusion:

    The authors identify a subclass of nonconvex-nonconcave minimax problems, represented by the so-called two-side PL condition, for which AGDA and Stoc-AGDA can converge to global saddle points.
  • The authors propose the first linearly-convergent variance-reduced AGDA algorithm that is provably faster than AGDA, for this subclass of minimax problems.
  • The authors hope this work can shed some light on the understanding of nonconvex-nonconcave minimax optimization: (1) different learning rates for two players are essential in GDA algorithms with alternating updates; (2) convexity-concavity is not a watershed to guarantee global convergence of GDA algorithms
Related work
  • Nonconvex minimax problems. There has been a recent surge in research on solving minimax optimization beyond the convex-concave regime [54, 8, 51, 56, 30, 47, 1, 32, 3, 48], but they differ from our work from various perspectives. Most of these work focus on the nonconvex-concave regime and aim for convergence to stationary points of minimax problems [8, 54, 31, 56]. Algorithms in these work require solving the inner maximization or some sub-problems with high accuracy, which are different from AGDA. Lin et al [30] proposed an inexact proximal point method to find an stationary point for a class of weakly-convex-weakly-concave minimax problems. Their convergence result relies on assuming the existence of a solution to the corresponding Minty variational inequality, which is hard to verify. Abernethy et al [1] showed the linear convergence of a second-order iterative algorithm, called Hamiltonian gradient descent, for a subclass of "sufficiently bilinear" functions. Very recently, Xu et al [60] and Bot and Böhm [4] anslyze AGDA in nonconvex-(strongly-)concave setting. There is also a line of work in understanding the dynamics in minimax games [39, 20, 19, 21, 12, 25].
Funding
  • Acknowledgments and Disclosure of Funding This work was supported in part by ONR grant W911NF-15-1-0479, NSF CCF-1704970, and NSF CMMI-1761699
Study subjects and analysis
datasets: 3
Datasets. We use three datasets in the experiments, and two of them are generated in the same way as in Du and Hu [15]. We generate the first dataset with n = 1000 and m = 500 by sampling rows of A from a Gaussian N (0, In) distribution and setting y0 = Ax∗ + with x∗ from Gaussian N (0, 1) and from Gaussian N (0, 0.01)

datasets represent cases with low: 3
The third dataset is generated with A ∈ R1000×500 from Gaussian N (0, Σ) where Σi,j = 2−|i−j|/10, M being rank-deficit with positive eigenvalues sampled from [0.2, 1.8] and λ = 1.5. These three datasets represent cases with low, median, and high condition numbers, respectively. Evaluation

Reference
  • J. Abernethy, K. A. Lai, and A. Wibisono. Last-iterate convergence rates for min-max optimization. arXiv preprint arXiv:1906.02027, 2019.
    Findings
  • J. P. Bailey, G. Gidel, and G. Piliouras. Finite regret and cycles with fixed step-size via alternating gradient descent-ascent. arXiv preprint arXiv:1907.04392, 2019.
    Findings
  • B. Barazandeh and M. Razaviyayn. Solving non-convex non-differentiable min-max games using proximal gradient method. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3162–3166. IEEE, 2020.
    Google ScholarLocate open access versionFindings
  • R. I. Bot and A. Böhm. Alternating proximal-gradient steps for (stochastic) nonconvex-concave minimax problems. arXiv preprint arXiv:2007.13605, 2020.
    Findings
  • Q. Cai, M. Hong, Y. Chen, and Z. Wang. On the global convergence of imitation learning: A case for linear quadratic regulator. arXiv preprint arXiv:1901.03674, 2019.
    Findings
  • M. Cassotti, D. Ballabio, V. Consonni, A. Mauri, I. V. Tetko, and R. Todeschini. Prediction of acute aquatic toxicity toward daphnia magna by using the ga-k nn method. Alternatives to Laboratory Animals, 42(1):31–41, 2014.
    Google ScholarLocate open access versionFindings
  • T. Chavdarova, G. Gidel, F. Fleuret, and S. Lacoste-Julien. Reducing noise in gan training with variance reduced extragradient. In Advances in Neural Information Processing Systems, pages 391–401, 2019.
    Google ScholarLocate open access versionFindings
  • R. S. Chen, B. Lucier, Y. Singer, and V. Syrgkanis. Robust optimization for non-convex objectives. In Advances in Neural Information Processing Systems, pages 4705–4714, 2017.
    Google ScholarLocate open access versionFindings
  • Y. Chen and M. Wang. Stochastic primal-dual methods and sample complexity of reinforcement learning. arXiv preprint arXiv:1612.02516, 2016.
    Findings
  • B. Dai, N. He, Y. Pan, B. Boots, and L. Song. Learning from conditional distributions via dual embeddings. In Artificial Intelligence and Statistics, pages 1458–1467, 2017.
    Google ScholarLocate open access versionFindings
  • B. Dai, A. Shaw, L. Li, L. Xiao, N. He, Z. Liu, J. Chen, and L. Song. SBEED: Convergent reinforcement learning with nonlinear function approximation. In Proceedings of the 35th International Conference on Machine Learning, pages 1125–1134, 2018.
    Google ScholarLocate open access versionFindings
  • C. Daskalakis and I. Panageas. The limit points of (optimistic) gradient descent in min-max optimization. In Advances in Neural Information Processing Systems, pages 9236–9246, 2018.
    Google ScholarLocate open access versionFindings
  • C. Daskalakis, A. Ilyas, V. Syrgkanis, and H. Zeng. Training gans with optimism. In International Conference on Learning Representations, 2018.
    Google ScholarLocate open access versionFindings
  • S. Du, J. Lee, H. Li, L. Wang, and X. Zhai. Gradient descent finds global minima of deep neural networks. In International Conference on Machine Learning, pages 1675–1685, 2019.
    Google ScholarLocate open access versionFindings
  • S. S. Du and W. Hu. Linear convergence of the primal-dual gradient method for convex-concave saddle point problems without strong convexity. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 196–205, 2019.
    Google ScholarLocate open access versionFindings
  • L. El Ghaoui and H. Lebret. Robust solutions to least-squares problems with uncertain data. SIAM Journal on matrix analysis and applications, 18(4):1035–1064, 1997.
    Google ScholarLocate open access versionFindings
  • F. Facchinei and J.-S. Pang. Finite-dimensional variational inequalities and complementarity problems. Springer Science & Business Media, 2007.
    Google ScholarFindings
  • M. Fazel, R. Ge, S. Kakade, and M. Mesbahi. Global convergence of policy gradient methods for the linear quadratic regulator. In International Conference on Machine Learning, pages 1467–1476, 2018.
    Google ScholarLocate open access versionFindings
  • T. Fiez and L. Ratliff. Gradient descent-ascent provably converges to strict local minmax equilibria with a finite timescale separation. arXiv preprint arXiv:2009.14820, 2020.
    Findings
  • T. Fiez, B. Chasnov, and L. J. Ratliff. Convergence of learning dynamics in stackelberg games. arXiv preprint arXiv:1906.01217, 2019.
    Findings
  • T. Fiez, B. Chasnov, and L. Ratliff. Implicit learning dynamics in stackelberg games: Equilibria characterization, convergence analysis, and empirical study. In International Conference on Machine Learning (ICML), 2020.
    Google ScholarLocate open access versionFindings
  • G. Gidel, R. A. Hemmat, M. Pezeshki, R. Le Priol, G. Huang, S. Lacoste-Julien, and I. Mitliagkas. Negative momentum for improved game dynamics. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 1802–1811, 2019.
    Google ScholarLocate open access versionFindings
  • I. Goodfellow, Y. Bengio, and A. Courville. Deep learning. MIT press, 2016.
    Google ScholarFindings
  • Z. Guo, Z. Yuan, Y. Yan, and T. Yang. Fast objective and duality gap convergence for non-convex strongly-concave min-max problems. arXiv preprint arXiv:2006.06889, 2020.
    Findings
  • Y.-P. Hsieh, P. Mertikopoulos, and V. Cevher. The limits of min-max optimization algorithms: convergence to spurious non-critical sets. arXiv preprint arXiv:2006.09065, 2020.
    Findings
  • C. Jin, P. Netrapalli, and M. I. Jordan. What is local optimality in nonconvex-nonconcave minimax optimization? arXiv preprint arXiv:1902.00618, 2019.
    Findings
  • R. Johnson and T. Zhang. Accelerating stochastic gradient descent using predictive variance reduction. In Advances in neural information processing systems, pages 315–323, 2013.
    Google ScholarLocate open access versionFindings
  • H. Karimi, J. Nutini, and M. Schmidt. Linear convergence of gradient and proximal-gradient methods under the polyak-łojasiewicz condition. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 795–811.
    Google ScholarLocate open access versionFindings
  • G. Korpelevich. The extragradient method for finding saddle points and other problems. Matecon, 12:747–756, 1976.
    Google ScholarLocate open access versionFindings
  • Q. Lin, M. Liu, H. Rafique, and T. Yang. Solving weakly-convex-weakly-concave saddlepoint problems as successive strongly monotone variational inequalities. arXiv preprint arXiv:1810.10207, 2018.
    Findings
  • T. Lin, C. Jin, and M. I. Jordan. On gradient descent ascent for nonconvex-concave minimax problems. arXiv preprint arXiv:1906.00331, 2019.
    Findings
  • T. Lin, C. Jin, M. Jordan, et al. Near-optimal algorithms for minimax optimization. arXiv preprint arXiv:2002.02417, 2020.
    Findings
  • M.-Y. Liu and O. Tuzel. Coupled generative adversarial networks. In Advances in neural information processing systems, pages 469–477, 2016.
    Google ScholarLocate open access versionFindings
  • L. Luo, C. Chen, Y. Li, G. Xie, and Z. Zhang. A stochastic proximal point algorithm for saddle-point problems. arXiv preprint arXiv:1909.06946, 2019.
    Findings
  • L. Luo, H. Ye, and T. Zhang. Stochastic recursive gradient descent ascent for stochastic nonconvex-strongly-concave minimax problems. arXiv preprint arXiv:2001.03724, 2020.
    Findings
  • Z.-Q. Luo and P. Tseng. Error bounds and convergence analysis of feasible descent methods: a general approach. Annals of Operations Research, 46(1):157–178, 1993.
    Google ScholarLocate open access versionFindings
  • A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018.
    Google ScholarLocate open access versionFindings
  • E. Mazumdar and L. J. Ratliff. On the convergence of gradient-based learning in continuous games. ArXiv e-prints, 2018.
    Google ScholarFindings
  • E. Mazumdar, L. J. Ratliff, and S. S. Sastry. On gradient-based learning in continuous games. SIAM Journal on Mathematics of Data Science, 2(1):103–131, 2020.
    Google ScholarLocate open access versionFindings
  • L. Mescheder, A. Geiger, and S. Nowozin. Which training methods for gans do actually converge? In International Conference on Machine Learning, pages 3481–3490, 2018.
    Google ScholarLocate open access versionFindings
  • L. Metz, B. Poole, D. Pfau, and J. Sohl-Dickstein. Unrolled generative adversarial networks. arXiv preprint arXiv:1611.02163, 2016.
    Findings
  • H. Namkoong and J. C. Duchi. Stochastic gradient methods for distributionally robust optimization with f-divergences. In Advances in Neural Information Processing Systems, pages 2208–2216, 2016.
    Google ScholarLocate open access versionFindings
  • H. Namkoong and J. C. Duchi. Variance-based regularization with convex objectives. In Advances in Neural Information Processing Systems, pages 2971–2980, 2017.
    Google ScholarLocate open access versionFindings
  • J. Nash. Two-person cooperative games. Econometrica: Journal of the Econometric Society, pages 128–140, 1953.
    Google ScholarLocate open access versionFindings
  • I. Necoara, Y. Nesterov, and F. Glineur. Linear convergence of first order methods for nonstrongly convex optimization. Mathematical Programming, pages 1–39, 2018.
    Google ScholarLocate open access versionFindings
  • Y. Nesterov and L. Scrimali. Solving strongly monotone variational and quasi-variational inequalities. Available at SSRN 970903, 2006.
    Google ScholarFindings
  • M. Nouiehed, M. Sanjabi, T. Huang, J. D. Lee, and M. Razaviyayn. Solving a class of nonconvex min-max games using iterative first order methods. In Advances in Neural Information Processing Systems, pages 14905–14916, 2019.
    Google ScholarLocate open access versionFindings
  • D. M. Ostrovskii, A. Lowy, and M. Razaviyayn. Efficient search of first-order nash equilibria in nonconvex-concave smooth min-max problems. arXiv preprint arXiv:2002.07919, 2020.
    Findings
  • B. Palaniappan and F. Bach. Stochastic variance reduction methods for saddle-point problems. In Advances in Neural Information Processing Systems, pages 1416–1424, 2016.
    Google ScholarLocate open access versionFindings
  • B. T. Polyak. Gradient methods for minimizing functionals. Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki, 3(4):643–653, 1963.
    Google ScholarLocate open access versionFindings
  • Q. Qian, S. Zhu, J. Tang, R. Jin, B. Sun, and H. Li. Robust optimization over multiple domains. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 4739–4746, 2019.
    Google ScholarLocate open access versionFindings
  • S. J. Reddi, A. Hefny, S. Sra, B. Poczos, and A. Smola. Stochastic variance reduction for nonconvex optimization. In International Conference on Machine Learning, pages 314–323, 2016.
    Google ScholarLocate open access versionFindings
  • S. J. Reddi, S. Sra, B. Póczos, and A. Smola. Fast incremental method for smooth nonconvex optimization. In 2016 IEEE 55th Conference on Decision and Control (CDC), pages 1971–1977. IEEE, 2016.
    Google ScholarLocate open access versionFindings
  • A. Sinha, H. Namkoong, and J. Duchi. Certifying some distributional robustness with principled adversarial training. In International Conference on Learning Representations, 2018.
    Google ScholarLocate open access versionFindings
  • J. Sun, Q. Qu, and J. Wright. A geometric analysis of phase retrieval. Foundations of Computational Mathematics, 18(5):1131–1198, 2018.
    Google ScholarLocate open access versionFindings
  • K. K. Thekumparampil, P. Jain, P. Netrapalli, and S. Oh. Efficient algorithms for smooth minimax optimization. In Advances in Neural Information Processing Systems, pages 12659– 12670, 2019.
    Google ScholarLocate open access versionFindings
  • J. Von Neumann, O. Morgenstern, and H. W. Kuhn. Theory of games and economic behavior (commemorative edition). Princeton university press, 2007.
    Google ScholarFindings
  • L. Xiao and T. Zhang. A proximal stochastic gradient method with progressive variance reduction. SIAM Journal on Optimization, 24(4):2057–2075, 2014.
    Google ScholarLocate open access versionFindings
  • T. Xu, Z. Wang, Y. Liang, and H. V. Poor. Enhanced first and zeroth order variance reduced algorithms for min-max optimization. arXiv preprint arXiv:2006.09361, 2020.
    Findings
  • Z. Xu, H. Zhang, Y. Xu, and G. Lan. A unified single-loop alternating gradient projection algorithm for nonconvex-concave and convex-nonconcave minimax problems. arXiv preprint arXiv:2006.02032, 2020.
    Findings
  • H. Zhang and W. Yin. Gradient methods for convex minimization: better rates under weaker conditions. arXiv preprint arXiv:1303.4645, 2013.
    Findings
  • J. Zhang, M. Hong, and S. Zhang. On lower iteration complexity bounds for the saddle point problems. arXiv preprint arXiv:1912.07481, 2019.
    Findings
  • K. Zhang, Z. Yang, and T. Basar. Policy optimization provably converges to nash equilibria in zero-sum linear quadratic games. In Advances in Neural Information Processing Systems, pages 11602–11614, 2019.
    Google ScholarLocate open access versionFindings
  • Y. Zhou, H. Zhang, and Y. Liang. Geometrical properties and accelerated gradient solvers of non-convex phase retrieval. In 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 331–335. IEEE, 2016.
    Google ScholarLocate open access versionFindings
Author
Junchi Yang
Junchi Yang
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科