# Small Nash Equilibrium Certificates in Very Large Games

NIPS 2020, 2020.

EI

Weibo:

Abstract:

In many game settings, the game is not explicitly given but is only accessible by playing it. While there have been impressive demonstrations in such settings, prior techniques have not offered safety guarantees, that is, guarantees on the game-theoretic exploitability of the computed strategies. In this paper we introduce an approach t...More

Code:

Data:

Introduction

- Recent years have witnessed AI breakthroughs in games such as poker [5, 27, 10, 12] where the rules are given.
- In many important applications—such as many war games and finance simulations—the rules are only given via black-box access, that is, via playing the game [34, 24], and one can try to construct good strategies by self play
- In such settings, deep reinforcement learning techniques are typically used today [16, 31, 24, 32, 33, 2].
- A recent PAC-learning algorithm has logarithmic sample complexity for pure maxmin strategies in normal-form games; it extends to some infinite games, but not effectively to mixed strategies in extensive-form games [26]

Highlights

- Recent years have witnessed AI breakthroughs in games such as poker [5, 27, 10, 12] where the rules are given
- We show that a certificate can be verified in time linear in the size of the certificate, without expanding the remainder of the game tree
- We prove that extensive-form games do not always have such, but under a certain informational assumption they do. We show that it is NP-hard to approximate to within a logarithmic factor the smallest certificate of a game, even in the zero-sum setting, and give an exponential lower bound for the time complexity of solving a black-box game as a function of the size of its smallest certificate
- We presented a notion of certificate for general extensive-form games that allows verification of exact and approximate Nash equilibria without expanding the whole game tree
- We presented algorithms for both verifying a certificate and computing the optimal certificate given the currentlyexplored trunk of a game
- Our experiments showed that many large or even infinite games have small certificates, allowing us to find equilibria while exploring a vanishingly small portion of the game

Methods

- The authors conducted experiments using the algorithm in Section 6 on the following common zero-sum benchmark games. (1) A zero-sum variant of the search game [4]. (2) k-rank Goofspiel.
- Players place bids for a prize of value t.
- In the perfect-information (PI) variant, P2 knows P1’s bid while bidding, and bids are made public after each round.
- This creates a perfect-infor√mation game in which P2 has a large advantage, and in which the authors expect a certificate of size O( N ).
- The possible payoffs in the game, and the length of the game, are both unbounded

Conclusion

- The authors presented a notion of certificate for general extensive-form games that allows verification of exact and approximate Nash equilibria without expanding the whole game tree.
- 2) Seek algorithms for finding certificates that give stronger guarantees of optimality than Theorem 6.10, especially in the case of infinite games with unbounded utilities.
- 3) Seek algorithms with stronger guarantees than that implied by Proposition 4.1 for verifying the Nash gap of a given strategy profile; for example, is it possible to construct the smallest trunk for which a given σ is an ε-equilibrium?
- What is the best way to balance sampling, game tree exploration, and equilibrium finding? 2) Seek algorithms for finding certificates that give stronger guarantees of optimality than Theorem 6.10, especially in the case of infinite games with unbounded utilities. 3) Seek algorithms with stronger guarantees than that implied by Proposition 4.1 for verifying the Nash gap of a given strategy profile; for example, is it possible to construct the smallest trunk for which a given σ is an ε-equilibrium?

Summary

## Introduction:

Recent years have witnessed AI breakthroughs in games such as poker [5, 27, 10, 12] where the rules are given.- In many important applications—such as many war games and finance simulations—the rules are only given via black-box access, that is, via playing the game [34, 24], and one can try to construct good strategies by self play
- In such settings, deep reinforcement learning techniques are typically used today [16, 31, 24, 32, 33, 2].
- A recent PAC-learning algorithm has logarithmic sample complexity for pure maxmin strategies in normal-form games; it extends to some infinite games, but not effectively to mixed strategies in extensive-form games [26]
## Methods:

The authors conducted experiments using the algorithm in Section 6 on the following common zero-sum benchmark games. (1) A zero-sum variant of the search game [4]. (2) k-rank Goofspiel.- Players place bids for a prize of value t.
- In the perfect-information (PI) variant, P2 knows P1’s bid while bidding, and bids are made public after each round.
- This creates a perfect-infor√mation game in which P2 has a large advantage, and in which the authors expect a certificate of size O( N ).
- The possible payoffs in the game, and the length of the game, are both unbounded
## Conclusion:

The authors presented a notion of certificate for general extensive-form games that allows verification of exact and approximate Nash equilibria without expanding the whole game tree.- 2) Seek algorithms for finding certificates that give stronger guarantees of optimality than Theorem 6.10, especially in the case of infinite games with unbounded utilities.
- 3) Seek algorithms with stronger guarantees than that implied by Proposition 4.1 for verifying the Nash gap of a given strategy profile; for example, is it possible to construct the smallest trunk for which a given σ is an ε-equilibrium?
- What is the best way to balance sampling, game tree exploration, and equilibrium finding? 2) Seek algorithms for finding certificates that give stronger guarantees of optimality than Theorem 6.10, especially in the case of infinite games with unbounded utilities. 3) Seek algorithms with stronger guarantees than that implied by Proposition 4.1 for verifying the Nash gap of a given strategy profile; for example, is it possible to construct the smallest trunk for which a given σ is an ε-equilibrium?

- Table1: Experimental results. The minimal certificate is a certificate after removing all unnecessary nodes per Proposition 4.1. Percentages are relative to game size. Leduc variants have infinite size; for them, “game size” reported is for the trunk with the number of consecutive raises restricted to 12

Reference

- Nicola Basilico and Nicola Gatti. Automated abstractions for patrolling security games. In AAAI Conference on Artificial Intelligence (AAAI), 2011.
- Christopher Berner, Greg Brockman, Brooke Chan, Vicki Cheung, Przemysław Debiak, Christy Dennison, David Farhi, Quirin Fischer, Shariq Hashme, Chris Hesse, et al. Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680, 2019.
- Darse Billings, Neil Burch, Aaron Davidson, Robert Holte, Jonathan Schaeffer, Terence Schauenberg, and Duane Szafron. Approximating game-theoretic optimal strategies for fullscale poker. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2003.
- Branislav Bošanskyand Jirí Cermák. Sequence-form algorithm for computing Stackelberg equilibria in extensive-form games. In Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015.
- Michael Bowling, Neil Burch, Michael Johanson, and Oskari Tammelin. Heads-up limit hold’em poker is solved. Science, 347(6218), January 2015.
- Noam Brown, Christian Kroer, and Tuomas Sandholm. Dynamic thresholding and pruning for regret minimization. In AAAI Conference on Artificial Intelligence (AAAI), 2017.
- Noam Brown and Tuomas Sandholm. Regret-based pruning in extensive-form games. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), 2015.
- Noam Brown and Tuomas Sandholm. Simultaneous abstraction and equilibrium finding in games. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2015.
- Noam Brown and Tuomas Sandholm. Reduced space and faster convergence in imperfectinformation games via pruning. In International Conference on Machine Learning (ICML), 2017.
- Noam Brown and Tuomas Sandholm. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science, page eaao1733, Dec. 2017.
- Noam Brown and Tuomas Sandholm. Solving imperfect-information games via discounted regret minimization. In AAAI Conference on Artificial Intelligence (AAAI), 2019.
- Noam Brown and Tuomas Sandholm. Superhuman AI for multiplayer poker. Science, 365(6456):885–890, 2019.
- Jirí Cermák, Branislav Bošansky, and Viliam Lisý. An algorithm for constructing and solving imperfect recall abstractions of large extensive-form games. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pages 936–942, 2017.
- Andrew Gilpin and Tuomas Sandholm. A competitive Texas Hold’em poker player via automated abstraction and real-time equilibrium computation. In Proceedings of the National Conference on Artificial Intelligence (AAAI), pages 1007–1013, 2006.
- Gurobi Optimization, LLC. Gurobi optimizer reference manual, 2019.
- Johannes Heinrich and David Silver. Deep reinforcement learning from self-play in imperfectinformation games. arXiv preprint arXiv:1603.01121, 2016.
- Samid Hoda, Andrew Gilpin, Javier Peña, and Tuomas Sandholm. Smoothing techniques for computing Nash equilibria of sequential games. Mathematics of Operations Research, 35(2), 2010.
- Donald E Knuth and Ronald W Moore. An analysis of alpha-beta pruning. Artificial Intelligence, 6(4):293–326, 1975.
- Daphne Koller, Nimrod Megiddo, and Bernhard von Stengel. Fast algorithms for finding randomized strategies in game trees. In Proceedings of the 26th ACM Symposium on Theory of Computing (STOC), 1994.
- Christian Kroer and Tuomas Sandholm. Extensive-form game abstraction with bounds. In Proceedings of the ACM Conference on Economics and Computation (EC), 2014.
- Christian Kroer and Tuomas Sandholm. A unified framework for extensive-form game abstraction with bounds. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), 2018.
- Christian Kroer, Kevin Waugh, Fatma Kılınç-Karzan, and Tuomas Sandholm. Faster algorithms for extensive-form game solving via improved smoothing functions. Mathematical Programming, 2020.
- Marc Lanctot, Kevin Waugh, Martin Zinkevich, and Michael Bowling. Monte Carlo sampling for regret minimization in extensive games. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), 2009.
- Marc Lanctot, Vinicius Zambaldi, Audrunas Gruslys, Angeliki Lazaridou, Karl Tuyls, Julien Pérolat, David Silver, and Thore Graepel. A unified game-theoretic approach to multiagent reinforcement learning. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), pages 4190–4203, 2017.
- Richard Lipton, Evangelos Markakis, and Aranyak Mehta. Playing large games using simple strategies. In Proceedings of the ACM Conference on Electronic Commerce (ACM-EC), pages 36–41, San Diego, CA, 2003. ACM.
- Alberto Marchesi, Francesco Trovò, and Nicola Gatti. Learning probably approximately correct maximin strategies in simulation-based games with infinite strategy spaces. In Autonomous Agents and Multi-Agent Systems, pages 834–842, 2020.
- Matej Moravcík, Martin Schmid, Neil Burch, Viliam Lisý, Dustin Morrill, Nolan Bard, Trevor Davis, Kevin Waugh, Michael Johanson, and Michael Bowling. Deepstack: Expert-level artificial intelligence in heads-up no-limit poker. Science, May 2017.
- Ran Raz and Shmuel Safra. A sub-constant error-probability low-degree test, and a subconstant error-probability PCP characterization of NP. In Proceedings of the Annual Symposium on Theory of Computing (STOC), pages 475–484, 1997.
- Aviad Rubinstein. Settling the complexity of computing approximate two-player Nash equilibria. In Proceedings of the Annual Symposium on Foundations of Computer Science (FOCS), pages 258–265, 2016.
- Tuomas Sandholm and Satinder Singh. Lossy stochastic game abstraction with bounds. In Proceedings of the ACM Conference on Electronic Commerce (EC), 2012.
- David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587):484, 2016.
- Sriram Srinivasan, Marc Lanctot, Vinicius Zambaldi, Julien Pérolat, Karl Tuyls, Rémi Munos, and Michael Bowling. Actor-critic policy optimization in partially observable multiagent environments. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), pages 3422–3435, 2018.
- Oriol Vinyals, Igor Babuschkin, Wojciech M Czarnecki, Michaël Mathieu, Andrew Dudzik, Junyoung Chung, David H Choi, Richard Powell, Timo Ewalds, Petko Georgiev, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 575(7782):350– 354, 2019.
- Michael Wellman. Methods for empirical game-theoretic analysis (extended abstract). In Proceedings of the National Conference on Artificial Intelligence (AAAI), pages 1552–1555, 2006.
- Brian Hu Zhang and Tuomas Sandholm. Sparsified linear programming for zero-sum equilibrium finding. In International Conference on Machine Learning (ICML), 2020.
- Yichi Zhou, Jialian Li, and Jun Zhu. Posterior sampling for multi-agent reinforcement learning: solving extensive games with imperfect information. In International Conference on Learning Representations, 2020.

Full Text

Tags

Comments