## AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically

Go Generating

## AI Traceability

AI parses the academic lineage of this thesis

Generate MRT

## AI Insight

AI extracts a summary of this paper

Weibo:
In this paper we proved tight last-iterate convergence rates for smooth monotone games when all players act according to the optimistic gradient algorithm, which is no-regret

# Tight last-iterate convergence rates for no-regret learning in multi-player games

NIPS 2020, (2020)

Cited by: 6|Views26
EI
Full Text
Bibtex
Weibo

Abstract

We study the question of obtaining last-iterate convergence rates for no-regret learning algorithms in multi-player games. We show that the optimistic gradient (OG) algorithm with a constant step-size, which is no-regret, achieves a last-iterate rate of $O(1/\sqrt{T})$ with respect to the gap function in smooth monotone games. This resu...More

Code:

Data:

Introduction
• In the setting of multi-agent online learning ([SS11, CBL06]), K players interact with each other over time.

At each time step t, each player k ∈ {1, . . . , K} chooses an action z(kt); z(kt) may represent, for instance, the bidding strategy of an advertiser at time t.
• Fails to capture the game dynamics over time ([MPP17]), and both types of guarantees use newly acquired information with decreasing weight, which, as remarked by [LZMJ20], is very unnatural from an economic perspective.1 the following question is of particular interest ([MZ18, LZMJ20, MPP17, DISZ17]): Can the authors establish last-iterate rates if all players act according to a no-regret learning algorithm with constant step size?
Highlights
• In the setting of multi-agent online learning ([SS11, CBL06]), K players interact with each other over time.

At each time step t, each player k ∈ {1, . . . , K} chooses an action z(kt); z(kt) may represent, for instance, the bidding strategy of an advertiser at time t
• A fundamental quantity used to measure the performance of an online learning algorithm is the regret of player k, which is the difference between the total loss of player k over T time steps and the loss of the best possible action in hindsight: formally, the regret at time T is
• We show in Theorem 5 and Corollary 6 that the actions taken by learners following the optimistic gradient (OG) algorithm, which is no-regret√, exhibit last-iterate convergence to a Nash equilibrium in smooth, monotone games at a rate of O(1/ T ) in terms of the global gap function
• As in prior work proving lower bounds for p-stationary canonical linear iterative methods (p-SCLIs) ([ASSS15, IAGM19]), we reduce the problem of proving a lower bound on TGapGDD (z(t)) to the problem of proving a lower bound on the supremum of the spectral norms of a family of polynomials
• In this paper we proved tight last-iterate convergence rates for smooth monotone games when all players act according to the optimistic gradient algorithm, which is no-regret
• As for l√ower bounds, it would be interesting to determine whether an algorithm-independent lower bound of Ω(1/ T ) in the context of Theorem 7 could be proven for stationary p-SCLIs
Results
• The authors show in Theorem 5 and Corollary 6 that the actions taken by learners following the optimistic gradient (OG) algorithm, which is no-regret√, exhibit last-iterate convergence to a Nash equilibrium in smooth, monotone games at a rate of O(1/ T ) in terms of the global gap function.
• This algorithm exhibits last-iterate convergence at a rate of O(1/ T ) in smooth monotone games when all players play according to it [GPDO20], it is straightforward to see that it is not a no-regret learning algorithm, i.e., for an adversarial loss function the regret can be linear in T.
• It has been shown ([DP18, LNPW20]) that a modification of OG known as optimistic multiplicativeweights update exhibits last-iterate convergence to Nash equilibria in two-player zero-sum monotone games, but as with the unconstrained case ([MOP19a]) non-asymptotic rates are unknown.
• To the best of the knowledge, the only work proving last-iterate convergence rates for general smooth monotone VIs was [GPDO20], which only treated the EG algorithm, which is not no-regret.
• The following essentially optimal regret bound is well-known for the OG algorithm, when the actions of the other players z(−t)k are adversarial: Proposition 3.
• The following Theorem 7 uses functions in Fnb,ill,D as “hard instances” to show that the O(1/ T ) rate of Corollary 5 cannot be improved by more than an algorithm-dependent constant factor.
• Using Proposition 8, the authors show that any p-SCLI algorithm must have a rate of at least ΩA(1/T ) for smooth convex function minimization.9 This is slower than the O(1/T 2) error achievable with Nesterov’s AGD with a time-varying learning rate.
Conclusion
• In this paper the authors proved tight last-iterate convergence rates for smooth monotone games when all players act according to the optimistic gradient algorithm, which is no-regret.
• [DP18, LNPW20] showed that OMWU exhibits last-iterate convergence, but non-asymptotic rates remain unknown even for the case that FG(·) is linear, which includes finite-action polymatrix games.
Tables
• Table1: Known last-iterate convergence rates for learning in smooth monotone games with perfect gradient feedback (i.e., deterministic algorithms). We specialize to the 2-player 0-sum case in presenting prior work, since some papers in the literature only consider this setting. Recall that a game G has a γ-singular value lower bound if for all z, all singular values of ∂FG(z) are ≥ γ. l, Λ are the Lipschitz constants of FG, ∂FG, respectively, and c, C > 0 are absolute constants where c is sufficiently small and C is sufficiently large. Upper bounds in the left-hand column are for the EG algorithm, and lower bounds are for a general form of 1-SCLI methods which include EG. Upper bounds in the right-hand column are for algorithms which are implementable as online no-regret learning algorithms
• Table2: Known upper bounds on last-iterate convergence rates for learning in smooth monotone games with noisy gradient feedback (i.e., stochastic algorithms). Rows of the table are as in Table 1; l, Λ are the Lipschitz constants of FG, ∂FG, respectively, and c > 0 is a sufficiently small absolute constant. The right-hand column contains algorithms implementable as online no-regret learning algorithms: stochastic optimistic gradient (Stoch. OG) or stochastic gradient descent (SGD). The left-hand column contains algorithms not implementable as no-regret algorithms, which includes stochastic extragradient (Stoch. EG), stochastic forward-backward (FB) splitting, double stepsize extragradient (DSEG), and stochastic variance reduced extragradient (SVRE). SVRE only applies in the finite-sum setting, which is a special case of (Abs) in which fk is a sum of m individual loss functions fk,i, and a noisy gradient is obtained as ∇fk,i for a random i ∈ [m]. Due to the stochasticity, many prior works make use of a step size ηt that decreases with t; we make note of whether this is the case (“ηt decr.”) or whether the step size ηt can be constant (“ηt const.”). For simplicity of presentation we assume Ω(1/t) ≤ {τt, σt} ≤ O(1) for all t ≥ 0 in all cases for which σt, τt vary with t. Reported bounds are stated for the total gap function (Definition 3); leading constants and factors depending on distance between initialization and optimum are omitted
Related work
• Multi-agent learning in games. In the constrained setting, many papers have studied conditions under which the action profile of no-regret learning algorithms, often variants of Follow-The-Regularized-Leader (FTRL), converges to equilibrium. However, these works all assume either a learning rate that decreases over time ([MZ18, ZMB+17, ZMA+18, ZMM+17]), or else only apply to specific types of potential games ([KKDB15, KBTB18, PPP17, KPT09, CL16, BEDL06, PP14]), which significantly facilitates the analysis of last-iterate convergence.3

Such potential games are in general incomparable with monotone games, and do not even include finitestate two-player zero sum games (i.e., matrix games). In fact, [BP18] showed that the actions of players following FTRL in two-player zero-sum matrix games diverge from interior Nash equilibria. Many other works ([HMC03, MPP17, KLP11, DFP+10, BCM12, PP16]) establish similar non-convergence results in both discrete and continuous time for various types of monotone games, including zero-sum polymatrix games. Such non-convergence includes chaotic behavior such as Poincaré recurrence, which showcases the insufficiency of on-average convergence (which holds in such settings) and so is additional motivation for the question (⋆).
Reference
• [Ahl79] L.V. Ahlfors. Complex Analysis. McGraw-Hill, 1979.
• [ALW19] Jacob Abernethy, Kevin A. Lai, and Andre Wibisono. Last-iterate convergence rates for minmax optimization. arXiv:1906.02027 [cs, math, stat], June 2019. arXiv: 1906.02027.
• [AMLJG19] Waïss Azizian, Ioannis Mitliagkas, Simon Lacoste-Julien, and Gauthier Gidel. A Tight and Unified Analysis of Extragradient for a Whole Spectrum of Differentiable Games. arXiv:1906.05945 [cs, math, stat], June 2019. arXiv: 1906.05945.
• Yossi Arjevani and Ohad Shamir. On the Iteration Complexity of Oblivious First-Order Optimization Algorithms. arXiv:1605.03529 [cs, math], May 2016. arXiv: 1605.03529.
• [ASM+20] Waïss Azizian, Damien Scieur, Ioannis Mitliagkas, Simon Lacoste-Julien, and Gauthier Gidel. Accelerating Smooth Games by Manipulating Spectral Shapes. arXiv:2001.00602 [cs, math, stat], January 2020. arXiv: 2001.00602.
• Yossi Arjevani, Shai Shalev-Shwartz, and Ohad Shamir. On Lower and Upper Bounds for Smooth and Strongly Convex Optimization Problems. arXiv:1503.06833 [cs, math], March 2015. arXiv: 1503.06833.
• Maria-Florina Balcan, Florin Constantin, and Ruta Mehta. The Weighted Majority Algorithm does not Converge in Nearly Zero-sum Games. In ICML Workshop on Markets, Mechanisms, and Multi-Agent Models, 2012.
• Avrim Blum, Eyal Even-Dar, and Katrina Ligett. Routing without regret: on convergence to nash equilibria of regret-minimizing algorithms in routing games. In Proceedings of the twentyfifth annual ACM symposium on Principles of distributed computing - PODC ’06, page 45, Denver, Colorado, USA, 2006. ACM Press.
• LM Bregman and IN Fokin. Methods of determining equilibrium situations in zero-sum polymatrix games. Optimizatsia, 40(57):70–82, 1987.
• Nikhil Bansal and Anupam Gupta. Potential-Function Proofs for First-Order Methods. arXiv:1712.04581 [cs, math], December 2017. arXiv: 1712.04581.
• James P. Bailey and Georgios Piliouras. Multiplicative Weights Update in Zero-Sum Games. In Proceedings of the 2018 ACM Conference on Economics and Computation - EC ’18, pages 321–338, Ithaca, NY, USA, 2018. ACM Press.
• [BTHK15] Daan Bloembergen, Karl Tuyls, Daniel Hennes, and Michael Kaisers. Evolutionary Dynamics of Multi-Agent Learning: A Survey. Journal of Artificial Intelligence Research, 53:659–697, August 2015.
• Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge University Press, Cambridge; New York, 2006. OCLC: 70056026.
• [CCDP16] Yang Cai, Ozan Candogan, Constantinos Daskalakis, and Christos Papadimitriou. ZeroSum Polymatrix Games: A Generalization of Minmax. Mathematics of Operations Research, 41(2):648–655, May 2016.
• Yang Cai and Constantinos Daskalakis. On minmax theorems for multiplayer games. In Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete algorithms, pages 217–234. SIAM, 2011.
• [CGFLJ19] Tatjana Chavdarova, Gauthier Gidel, François Fleuret, and Simon Lacoste-Julien. Reducing Noise in GAN Training with Variance Reduced Extragradient. arXiv:1904.08598 [cs, math, stat], April 2019. arXiv: 1904.08598.
• Po-An Chen and Chi-Jen Lu. Generalized mirror descents in congestion games. Artificial Intelligence, 241:217–243, December 2016.
• [CYL+12] Chao-Kai Chiang, Tianbao Yang, Chia-Jung Lee, Mehrdad Mahdavi, Chi-Jen Lu, Rong Jin, and Shenghuo Zhu. Online Optimization with Gradual Variations. In Proceedings of the 25th Annual Conference on Learning Theory, page 20, 2012.
• Constantinos Daskalakis, Rafael Frongillo, Christos H. Papadimitriou, George Pierrakos, and Gregory Valiant. On Learning Algorithms for Nash Equilibria. In Algorithmic Game Theory, volume 6386, pages 114–125. Springer Berlin Heidelberg, Berlin, Heidelberg, 2010.
• [DISZ17] Constantinos Daskalakis, Andrew Ilyas, Vasilis Syrgkanis, and Haoyang Zeng. Training GANs with Optimism. arXiv:1711.00141 [cs, stat], October 2017. arXiv: 1711.00141.
• Constantinos Daskalakis and Ioannis Panageas. Last-Iterate Convergence: Zero-Sum Games and Constrained Min-Max Optimization. arXiv:1807.04252 [cs, math, stat], July 2018. arXiv: 1807.04252.
• [EDMN09] Eyal Even-Dar, Yishay Mansour, and Uri Nadav. On the convergence of regret minimization dynamics in concave games. In Proceedings of the forty-first annual ACM symposium on Theory of computing, pages 523–532, 2009.
• Alireza Fallah, Asuman Ozdaglar, and Sarath Pattathil. An Optimal Multistage Stochastic Gradient Method for Minimax Problems. arXiv:2002.05683 [cs, math, stat], February 2020. arXiv: 2002.05683.
• Francisco Facchinei and Jong-Shi Pang. Finite-dimensional variational inequalities and complementarity problems. Springer series in operations research. Springer, New York, 2003.
• [GBV+18] Gauthier Gidel, Hugo Berard, Gaëtan Vignoud, Pascal Vincent, and Simon Lacoste-Julien. A Variational Inequality Perspective on Generative Adversarial Networks. arXiv:1802.10551 [cs, math, stat], February 2018. arXiv: 1802.10551.
• [GHP+18] Gauthier Gidel, Reyhane Askari Hemmat, Mohammad Pezeshki, Remi Lepriol, Gabriel Huang, Simon Lacoste-Julien, and Ioannis Mitliagkas. Negative Momentum for Improved Game Dynamics. arXiv:1807.04740 [cs, stat], July 2018. arXiv: 1807.04740.
• [GPDO20] Noah Golowich, Sarath Pattathil, Constantinos Daskalakis, and Asuman Ozdaglar. Last Iterate is Slower than Averaged Iterate in Smooth Convex-Concave Saddle Point Problems. In arXiv: 2002.00057, 2020.
• [HIMM19] Yu-Guan Hsieh, Franck Iutzeler, Jérome Malick, and Panayotis Mertikopoulos. On the convergence of single-call stochastic extra-gradient methods. arXiv:1908.08465 [cs, math], August 2019. arXiv: 1908.08465.
• [HIMM20] Yu-Guan Hsieh, Franck Iutzeler, Jerome Malick, and Panayotis Mertikopoulos. Explore Aggressively, Update Conservatively: Stochastic Extragradient Methods with Variable Stepsize Scaling. arXiv:2003.10162, page 27, 2020.
• Roger A. Horn and Charles R. Johnson. Matrix analysis. Cambridge University Press, Cambridge; New York, 2nd ed edition, 2012.
• [HMC03] Sergiu Hart and Andreu Mas-Colell. Uncoupled Dynamics Do Not Lead to Nash Equilibrium. THE AMERICAN ECONOMIC REVIEW, 93(5):7, 2003.
• [IAGM19] Adam Ibrahim, Waïss Azizian, Gauthier Gidel, and Ioannis Mitliagkas. Linear Lower Bounds and Conditioning of Differentiable Games. arXiv:1906.07300 [cs, math, stat], October 2019. arXiv: 1906.07300.
• [KBTB18] Walid Krichene, Mohamed Chedhli Bourguiba, Kiet Tlam, and Alexandre Bayen. On Learning How Players Learn: Estimation of Learning Dynamics in the Routing Game. ACM Trans. Cyber-Phys. Syst., 2(1):6:1–6:23, January 2018.
• Syrine Krichene, Walid Krichene, Roy Dong, and Alexandre Bayen. Convergence of heterogeneous distributed learning in stochastic routing games. In 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 480–487, Monticello, IL, September 2015. IEEE.
• Robert Kleinberg, Katrina Ligett, and Georgios Piliouras. Beyond the Nash Equilibrium Barrier. page 15, 2011.
• G. M. Korpelevich. The extragradient method for finding saddle points and other problems. Ekonomika i Matem. Metody, 12(4):747–756, 1976.
• Victor Kozyakin. On accuracy of approximation of the spectral radius by the Gelfand formula. Linear Algebra and its Applications, 431:2134–2141, 2009.
• Robert Kleinberg, Georgios Piliouras, and Eva Tardos. Multiplicative updates outperform generic no-regret learning in congestion games: extended abstract. In Proceedings of the 41st annual ACM symposium on Symposium on theory of computing - STOC ’09, page 533, Bethesda, MD, USA, 2009. ACM Press.
• Aswin Kannan, Uday, and V. Shanbhag. Pseudomonotone Stochastic Variational Inequality Problems: Analysis and Optimal Stochastic Approximation Schemes. Computational Optimization and Applications, 74:669–820, 2019.
• [LBJM+20] Nicolas Loizou, Hugo Berard, Alexia Jolicoeur-Martineau, Pascal Vincent, Simon LacosteJulien, and Ioannis Mitliagkas. Stochastic Hamiltonian Gradient Methods for Smooth Games. arXiv:2007.04202 [cs, math, stat], July 2020. arXiv: 2007.04202.
• [LNPW20] Qi Lei, Sai Ganesh Nagarajan, Ioannis Panageas, and Xiao Wang. Last iterate convergence in no-regret learning: constrained min-max optimization for convex-concave landscapes. arXiv:2002.06768 [cs, stat], February 2020. arXiv: 2002.06768.
• Tengyuan Liang and James Stokes. Interaction Matters: A Note on Non-asymptotic Local Convergence of Generative Adversarial Networks. arXiv:1802.06132 [cs, stat], February 2018. arXiv: 1802.06132.
• Tianyi Lin, Zhengyuan Zhou, Panayotis Mertikopoulos, and Michael I. Jordan. Finite-Time Last-Iterate Convergence for Multi-Agent Learning in Games. arXiv:2002.09806 [cs, math, stat], February 2020. arXiv: 2002.09806.
• [MKS+19] Konstantin Mishchenko, Dmitry Kovalev, Egor Shulgin, Peter Richtárik, and Yura Malitsky. Revisiting Stochastic Extragradient. arXiv:1905.11373 [cs, math], May 2019. arXiv: 1905.11373.
• [MOP19a] Aryan Mokhtari, Asuman Ozdaglar, and Sarath Pattathil. Convergence Rate of $\mathcal{O}(1/k)$ for Optimistic Gradient and Extra-gradient Methods in Smooth Convex-Concave Saddle Point Problems. arXiv:1906.01115 [cs, math, stat], June 2019. arXiv: 1906.01115.
• [MOP19b] Aryan Mokhtari, Asuman Ozdaglar, and Sarath Pattathil. A Unified Analysis of Extragradient and Optimistic Gradient Methods for Saddle Point Problems: Proximal Point Approach. arXiv:1901.08511 [cs, math, stat], January 2019. arXiv: 1901.08511.
• Barnabé Monnot and Georgios Piliouras. Limits and limitations of no-regret learning in games. The Knowledge Engineering Review, 32, 2017.
• [MPP17] Panayotis Mertikopoulos, Christos Papadimitriou, and Georgios Piliouras. Cycles in adversarial regularized learning. arXiv:1709.02738 [cs], September 2017. arXiv: 1709.02738.
• H. Moulin and J. P. Vial. Strategically zero-sum games: The class of games whose completely mixed equilibria cannot be improved upon. Int J Game Theory, 7(3):201–221, September 1978.
• Panayotis Mertikopoulos and Zhengyuan Zhou. Learning in games with continuous action sets and unknown payoff functions. arXiv:1608.07310 [cs, math], January 2018. arXiv: 1608.07310 version: 2.
• Arkadi Nemirovski. Prox-Method with Rate of Convergence O (1/ t ) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems. SIAM Journal on Optimization, 15(1):229–251, January 2004.
• M. C. Nesterov. Introductory Lectures on Convex Programming. North-Holland, 1975.
• Yurii Nesterov. Cubic Regularization of Newton’s Method for Convex Problems with Constraints. SSRN Electronic Journal, 2006.
• Yurii Nesterov. Primal-dual subgradient methods for convex problems. Math. Program., 120(1):221–259, August 2009.
• Olavi Nevanlinna. Convergence of Iterations for Linear Equations. Birkhäuser Basel, Basel, 1993.
• Yuyuan Ouyang and Yangyang Xu. Lower complexity bounds of first-order methods for convexconcave bilinear saddle-point problems. Mathematical Programming, August 2019.
• Balamurugan Palaniappan and Francis Bach. Stochastic Variance Reduction Methods for Saddle-Point Problems. In Proceedings of the 30th International Conferencne on Neural Information Processing Systems, pages 1416–1424, 2016.
• Boris T. Polyak. Introduction to optimization., volume 1. Optimization Software, 1987.
• L. D. Popov. A modification of the Arrow-Hurwicz method for search of saddle points. Mathematical Notes of the Academy of Sciences of the USSR, 28(5):845–848, November 1980.
• Ioannis Panageas and Georgios Piliouras. Average Case Performance of Replicator Dynamics in Potential Games via Computing Regions of Attraction. arXiv:1403.3885 [cs, math], 2014. arXiv: 1403.3885.
• Christos Papadimitriou and Georgios Piliouras. From Nash Equilibria to Chain Recurrent Sets: Solution Concepts and Topology. In Proceedings of the 2016 ACM Conference on Innovations in Theoretical Computer Science - ITCS ’16, pages 227–235, Cambridge, Massachusetts, USA, 2016. ACM Press.
• Gerasimos Palaiopanos, Ioannis Panageas, and Georgios Piliouras. Multiplicative Weights Update with Constant Step-Size in Congestion Games: Convergence, Limit Cycles and Chaos. arXiv:1703.01138 [cs], March 2017. arXiv: 1703.01138.
• J.B. Rosen. Existence and Uniqueness of Equilibrium Points for Concave N-Person Games. Econometrica, 33(3):520–534, 1965.
• Alexander Rakhlin and Karthik Sridharan. Online Learning with Predictable Sequences. arXiv:1208.3728 [cs, stat], August 2012. arXiv: 1208.3728.
• Alexander Rakhlin and Karthik Sridharan. Optimization, Learning, and Games with Predictable Sequences. arXiv:1311.1869 [cs], November 2013. arXiv: 1311.1869.
• Lorenzo Rosasco, Silvia Villa, and Bang Cong Vu. A Stochastic forward-backward splitting method for solving monotone inclusions in Hilbert spaces. Journal of Optimization Theory and Applications, 169:388–406, 2016. arXiv: 1403.7999.
• Shai Shalev-Shwartz. Online Learning and Online Convex Optimization. Foundations and Trends® in Machine Learning, 4(2):107–194, 2011.
• Paul Tseng. On linear convergence of iterative methods for the variational inequality problem. Journal of Computational and Applied Mathematics, 60(1-2):237–252, June 1995.
• Yannick Viossat and Andriy Zapechelnyuk. No-regret Dynamics and Fictitious Play. Journal of Economic Theory, 148(2):825–842, March 2013. arXiv: 1207.0660.
• [WRJ18] Ashia C. Wilson, Benjamin Recht, and Michael I. Jordan. A Lyapunov Analysis of Momentum Methods in Optimization. arXiv:1611.02635 [cs, math], March 2018. arXiv: 1611.02635.
• [YSX+17] Abhay Yadav, Sohil Shah, Zheng Xu, David Jacobs, and Tom Goldstein. Stabilizing Adversarial Nets With Prediction Methods. arXiv:1705.07364 [cs], May 2017. arXiv: 1705.07364.
• Zhengyuan Zhou, Panayotis Mertikopoulos, Susan Athey, Nicholas Bambos, Peter W Glynn, and Yinyu Ye. Learning in Games with Lossy Feedback. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 5134–5144. Curran Associates, Inc., 2018.
• Zhengyuan Zhou, Panayotis Mertikopoulos, Nicholas Bambos, Peter W Glynn, and Claire Tomlin. Countering Feedback Delays in Multi-Agent Learning. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 6171–6181. Curran Associates, Inc., 2017.
• [ZMM+17] Zhengyuan Zhou, Panayotis Mertikopoulos, Aris L. Moustakas, Nicholas Bambos, and Peter Glynn. Mirror descent learning in continuous games. In 2017 IEEE 56th Annual Conference on Decision and Control (CDC), pages 5776–5783, Melbourne, Australia, December 2017. IEEE.
• [ZMM+20] Zhengyuan Zhou, Panayotis Mertikopoulos, Aris L Moustakas, Nicholas Bambos, and Peter Glynn. Robust Power Management via Learning and Game Design. Operations Research, 2020.
Author
Noah Golowich
Sarath Pattathil