Solving Imperfect-Information Games via Discounted Regret Minimization
national conference on artificial intelligence, 2019.
EI
Weibo:
Abstract:
Counterfactual regret minimization (CFR) is a family of iterative algorithms that are the most popular and, in practice, fastest approach to approximately solving large imperfect-information games. In this paper we introduce novel CFR variants that 1) discount regrets from earlier iterations in various ways (in some cases differently for ...More
Code:
Data:
Introduction
- Imperfect-information games model strategic interactions between players that have hidden information, such as in negotiations, cybersecurity, and auctions.
- For extremely large imperfect-information games that cannot fit in a linear program of manageable size, typically iterative algorithms are used to approximate an equilibrium.
- CFR+ was used to essentially solve heads-up limit Texas hold’em poker (Bowling et al 2015) and was used to approximately solve heads-up no-limit Texas hold’em (HUNL) endgames in Libratus, which defeated HUNL top professionals (Brown and Sandholm 2017c; 2017b).
Highlights
- Imperfect-information games model strategic interactions between players that have hidden information, such as in negotiations, cybersecurity, and auctions
- For extremely large imperfect-information games that cannot fit in a linear program of manageable size, typically iterative algorithms are used to approximate an equilibrium
- Some combinations of our ideas perform significantly better than counterfactual regret minimization (CFR)+ while others perform worse than it
- Our experiments show that LCFR can dramatically improve performance over CFR+ over reasonable time horizons in certain games
- We found that NH did worse in all HUNL subgames compared to regret matching (RM)
- We introduced variants of CFR that discount prior iterations, leading to stronger performance than the prior stateof-the-art CFR+, in settings that involve large mistakes
Methods
- Experiments on Regret Discounting and Weighted
Averaging
The authors' experiments are run for 32,768 iterations for HUNL subgames and 8,192 iterations for Goofspiel. - Since all the algorithms tested only converge to an -equilibrium rather than calculating an exact equilibrium, it is up to the user to decide when a solution is sufficiently converged to terminate a run.
- This is usually after 100 - 1,000 iterations (Brown and Sandholm 2017c; Moravc ́ık et al 2017).
- All the experiments use the alternating-updates form of CFR
Results
- Some combinations of the ideas perform significantly better than CFR+ while others perform worse than it.
- The authors' experiments show that LCFR can dramatically improve performance over CFR+ over reasonable time horizons in certain games
Conclusion
- The authors introduced variants of CFR that discount prior iterations, leading to stronger performance than the prior stateof-the-art CFR+, in settings that involve large mistakes.
In particular, the
,0,2 variant matched or outperformed CFR+ in all settings.
Summary
Introduction:
Imperfect-information games model strategic interactions between players that have hidden information, such as in negotiations, cybersecurity, and auctions.- For extremely large imperfect-information games that cannot fit in a linear program of manageable size, typically iterative algorithms are used to approximate an equilibrium.
- CFR+ was used to essentially solve heads-up limit Texas hold’em poker (Bowling et al 2015) and was used to approximately solve heads-up no-limit Texas hold’em (HUNL) endgames in Libratus, which defeated HUNL top professionals (Brown and Sandholm 2017c; 2017b).
Methods:
Experiments on Regret Discounting and Weighted
Averaging
The authors' experiments are run for 32,768 iterations for HUNL subgames and 8,192 iterations for Goofspiel.- Since all the algorithms tested only converge to an -equilibrium rather than calculating an exact equilibrium, it is up to the user to decide when a solution is sufficiently converged to terminate a run.
- This is usually after 100 - 1,000 iterations (Brown and Sandholm 2017c; Moravc ́ık et al 2017).
- All the experiments use the alternating-updates form of CFR
Results:
Some combinations of the ideas perform significantly better than CFR+ while others perform worse than it.- The authors' experiments show that LCFR can dramatically improve performance over CFR+ over reasonable time horizons in certain games
Conclusion:
The authors introduced variants of CFR that discount prior iterations, leading to stronger performance than the prior stateof-the-art CFR+, in settings that involve large mistakes.
In particular, the
,0,2 variant matched or outperformed CFR+ in all settings.
Funding
- This material is based on work supported by the National Science Foundation under grants IIS-1718457, IIS-1617590, and CCF-1733556, and the ARO under award W911NF-171-0082
- Noam is also sponsored by an Open Philanthropy Project AI Fellowship and a Tencent AI Lab Fellowship
Reference
- Bowling, M.; Burch, N.; Johanson, M.; and Tammelin, O. 2015. Heads-up limit hold’em poker is solved. Science 347(6218):145–149.
- Brown, N., and Sandholm, T. 2014. Regret transfer and parameter optimization. In AAAI, 594–601.
- Brown, N., and Sandholm, T. 2015a. Regret-based pruning in extensive-form games. In NIPS, 1972–1980.
- Brown, N., and Sandholm, T. 2015b. Simultaneous abstraction and equilibrium finding in games. In International Joint Conference on Artificial Intelligence (IJCAI).
- Brown, N., and Sandholm, T. 2017a. Reduced space and faster convergence in imperfect-information games via pruning. In International Conference on Machine Learning.
- Brown, N., and Sandholm, T. 2017b. Safe and nested subgame solving for imperfect-information games. In Advances in Neural Information Processing Systems, 689–699.
- Brown, N., and Sandholm, T. 2017c. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science eaao1733.
- Brown, N.; Kroer, C.; and Sandholm, T. 2017. Dynamic thresholding and pruning for regret minimization. In AAAI Conference on Artificial Intelligence (AAAI), 421–429.
- Brown, N.; Sandholm, T.; and Amos, B. 2018. Depthlimited solving for imperfect-information games. In Advances in Neural Information Processing Systems.
- Burch, N.; Moravcik, M.; and Schmid, M. 2018. Revisiting cfr+ and alternating updates. arXiv preprint arXiv:1810.11542.
- Burch, N. 2017. Time and Space: Why Imperfect Information Games are Hard. Ph.D. Dissertation, University of Alberta.
- Cesa-Bianchi, N., and Lugosi, G. 2006.
- Chaudhuri, K.; Freund, Y.; and Hsu, D. J. 2009. A parameter-free hedging algorithm. In Advances in neural information processing systems, 297–305.
- Freund, Y., and Schapire, R. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences.
- Gibson, R.; Lanctot, M.; Burch, N.; Szafron, D.; and Bowling, M. 2012. Generalized sampling and variance in counterfactual regret minimization. In AAAI Conference on Artificial Intelligence, 1355–1361.
- Hart, S., and Mas-Colell, A. 2000. A simple adaptive procedure leading to correlated equilibrium. Econometrica 68:1127–1150.
- Hashimoto, J.; Kishimoto, A.; Yoshizoe, K.; and Ikeda, K. 2011. Accelerated UCT and its application to two-player games. In Advances in Computer Games, 1–12. Springer.
- Heinrich, J.; Lanctot, M.; and Silver, D. 2015. Fictitious self-play in extensive-form games. In ICML, 805–813.
- Hoda, S.; Gilpin, A.; Pena, J.; and Sandholm, T. 2010. Smoothing techniques for computing Nash equilibria of sequential games. Mathematics of Operations Research 35(2):494–512. Conference version appeared in WINE-07.
- Jackson, E. 2017. Targeted CFR. In AAAI Workshop on Computer Poker and Imperfect Information.
- Johanson, M.; Waugh, K.; Bowling, M.; and Zinkevich, M. 2011. Accelerating best response calculation in large extensive games. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 258–265.
- Kroer, C.; Waugh, K.; Kılınc-Karzan, F.; and Sandholm, T. 2015. Faster first-order methods for extensive-form game solving. In Proceedings of the ACM Conference on Economics and Computation (EC), 817–834. ACM.
- Lanctot, M.; Waugh, K.; Zinkevich, M.; and Bowling, M. 2009. Monte Carlo sampling for regret minimization in extensive games. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), 1078–1086.
- Littlestone, N., and Warmuth, M. K. 1994. The weighted majority algorithm. Information and Computation 108(2):212–261.
- Moravcık, M.; Schmid, M.; Burch, N.; Lisy, V.; Morrill, D.; Bard, N.; Davis, T.; Waugh, K.; Johanson, M.; and Bowling, M. 2017. Deepstack: Expert-level artificial intelligence in heads-up no-limit poker. Science.
- Nash, J. 1950. Equilibrium points in n-person games. Proceedings of the National Academy of Sciences 36:48–49.
- Nesterov, Y. 2005. Excessive gap technique in nonsmooth convex minimization. SIAM Journal of Optimization 16(1):235–249.
- Pays, F. 2014. An interior point approach to large games of incomplete information. In AAAI Computer Poker Workshop.
- Syrgkanis, V.; Agarwal, A.; Luo, H.; and Schapire, R. E. 2015. Fast convergence of regularized learning in games. In Neural Information Processing Systems, 2989–2997.
- Tammelin, O.; Burch, N.; Johanson, M.; and Bowling, M. 2015. Solving heads-up limit texas hold’em. In IJCAI.
- Tammelin, O. 2014. Solving large imperfect information games using cfr+. arXiv preprint arXiv:1407.5042.
- Waugh, K. 2009. Abstraction in large extensive games. Master’s thesis, University of Alberta.
- Zinkevich, M.; Johanson, M.; Bowling, M. H.; and Piccione, C. 2007. Regret minimization in games with incomplete information. In Neural Information Processing Systems (NIPS), 1729–1736.
Full Text
Best Paper
Best Paper of AAAI, 2019
Tags
Comments