Solving Imperfect-Information Games via Discounted Regret Minimization

national conference on artificial intelligence, 2019.

Cited by: 28|Bibtex|Views111|DOI:https://doi.org/10.1609/aaai.v33i01.33011829
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com|arxiv.org
Weibo:
We introduced variants of counterfactual regret minimization that discount prior iterations, leading to stronger performance than the prior stateof-the-art CFR+, in settings that involve large mistakes

Abstract:

Counterfactual regret minimization (CFR) is a family of iterative algorithms that are the most popular and, in practice, fastest approach to approximately solving large imperfect-information games. In this paper we introduce novel CFR variants that 1) discount regrets from earlier iterations in various ways (in some cases differently for ...More

Code:

Data:

0
Introduction
  • Imperfect-information games model strategic interactions between players that have hidden information, such as in negotiations, cybersecurity, and auctions.
  • For extremely large imperfect-information games that cannot fit in a linear program of manageable size, typically iterative algorithms are used to approximate an equilibrium.
  • CFR+ was used to essentially solve heads-up limit Texas hold’em poker (Bowling et al 2015) and was used to approximately solve heads-up no-limit Texas hold’em (HUNL) endgames in Libratus, which defeated HUNL top professionals (Brown and Sandholm 2017c; 2017b).
Highlights
  • Imperfect-information games model strategic interactions between players that have hidden information, such as in negotiations, cybersecurity, and auctions
  • For extremely large imperfect-information games that cannot fit in a linear program of manageable size, typically iterative algorithms are used to approximate an equilibrium
  • Some combinations of our ideas perform significantly better than counterfactual regret minimization (CFR)+ while others perform worse than it
  • Our experiments show that LCFR can dramatically improve performance over CFR+ over reasonable time horizons in certain games
  • We found that NH did worse in all HUNL subgames compared to regret matching (RM)
  • We introduced variants of CFR that discount prior iterations, leading to stronger performance than the prior stateof-the-art CFR+, in settings that involve large mistakes
Methods
  • Experiments on Regret Discounting and Weighted

    Averaging

    The authors' experiments are run for 32,768 iterations for HUNL subgames and 8,192 iterations for Goofspiel.
  • Since all the algorithms tested only converge to an -equilibrium rather than calculating an exact equilibrium, it is up to the user to decide when a solution is sufficiently converged to terminate a run.
  • This is usually after 100 - 1,000 iterations (Brown and Sandholm 2017c; Moravc ́ık et al 2017).
  • All the experiments use the alternating-updates form of CFR
Results
  • Some combinations of the ideas perform significantly better than CFR+ while others perform worse than it.
  • The authors' experiments show that LCFR can dramatically improve performance over CFR+ over reasonable time horizons in certain games
Conclusion
  • The authors introduced variants of CFR that discount prior iterations, leading to stronger performance than the prior stateof-the-art CFR+, in settings that involve large mistakes.

    In particular, the

    ,0,2 variant matched or outperformed CFR+ in all settings.
Summary
  • Introduction:

    Imperfect-information games model strategic interactions between players that have hidden information, such as in negotiations, cybersecurity, and auctions.
  • For extremely large imperfect-information games that cannot fit in a linear program of manageable size, typically iterative algorithms are used to approximate an equilibrium.
  • CFR+ was used to essentially solve heads-up limit Texas hold’em poker (Bowling et al 2015) and was used to approximately solve heads-up no-limit Texas hold’em (HUNL) endgames in Libratus, which defeated HUNL top professionals (Brown and Sandholm 2017c; 2017b).
  • Methods:

    Experiments on Regret Discounting and Weighted

    Averaging

    The authors' experiments are run for 32,768 iterations for HUNL subgames and 8,192 iterations for Goofspiel.
  • Since all the algorithms tested only converge to an -equilibrium rather than calculating an exact equilibrium, it is up to the user to decide when a solution is sufficiently converged to terminate a run.
  • This is usually after 100 - 1,000 iterations (Brown and Sandholm 2017c; Moravc ́ık et al 2017).
  • All the experiments use the alternating-updates form of CFR
  • Results:

    Some combinations of the ideas perform significantly better than CFR+ while others perform worse than it.
  • The authors' experiments show that LCFR can dramatically improve performance over CFR+ over reasonable time horizons in certain games
  • Conclusion:

    The authors introduced variants of CFR that discount prior iterations, leading to stronger performance than the prior stateof-the-art CFR+, in settings that involve large mistakes.

    In particular, the

    ,0,2 variant matched or outperformed CFR+ in all settings.
Funding
  • This material is based on work supported by the National Science Foundation under grants IIS-1718457, IIS-1617590, and CCF-1733556, and the ARO under award W911NF-171-0082
  • Noam is also sponsored by an Open Philanthropy Project AI Fellowship and a Tencent AI Lab Fellowship
Reference
  • Bowling, M.; Burch, N.; Johanson, M.; and Tammelin, O. 2015. Heads-up limit hold’em poker is solved. Science 347(6218):145–149.
    Google ScholarLocate open access versionFindings
  • Brown, N., and Sandholm, T. 2014. Regret transfer and parameter optimization. In AAAI, 594–601.
    Google ScholarLocate open access versionFindings
  • Brown, N., and Sandholm, T. 2015a. Regret-based pruning in extensive-form games. In NIPS, 1972–1980.
    Google ScholarFindings
  • Brown, N., and Sandholm, T. 2015b. Simultaneous abstraction and equilibrium finding in games. In International Joint Conference on Artificial Intelligence (IJCAI).
    Google ScholarLocate open access versionFindings
  • Brown, N., and Sandholm, T. 2017a. Reduced space and faster convergence in imperfect-information games via pruning. In International Conference on Machine Learning.
    Google ScholarLocate open access versionFindings
  • Brown, N., and Sandholm, T. 2017b. Safe and nested subgame solving for imperfect-information games. In Advances in Neural Information Processing Systems, 689–699.
    Google ScholarLocate open access versionFindings
  • Brown, N., and Sandholm, T. 2017c. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science eaao1733.
    Google ScholarLocate open access versionFindings
  • Brown, N.; Kroer, C.; and Sandholm, T. 2017. Dynamic thresholding and pruning for regret minimization. In AAAI Conference on Artificial Intelligence (AAAI), 421–429.
    Google ScholarLocate open access versionFindings
  • Brown, N.; Sandholm, T.; and Amos, B. 2018. Depthlimited solving for imperfect-information games. In Advances in Neural Information Processing Systems.
    Google ScholarLocate open access versionFindings
  • Burch, N.; Moravcik, M.; and Schmid, M. 2018. Revisiting cfr+ and alternating updates. arXiv preprint arXiv:1810.11542.
    Findings
  • Burch, N. 2017. Time and Space: Why Imperfect Information Games are Hard. Ph.D. Dissertation, University of Alberta.
    Google ScholarFindings
  • Cesa-Bianchi, N., and Lugosi, G. 2006.
    Google ScholarFindings
  • Chaudhuri, K.; Freund, Y.; and Hsu, D. J. 2009. A parameter-free hedging algorithm. In Advances in neural information processing systems, 297–305.
    Google ScholarLocate open access versionFindings
  • Freund, Y., and Schapire, R. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences.
    Google ScholarLocate open access versionFindings
  • Gibson, R.; Lanctot, M.; Burch, N.; Szafron, D.; and Bowling, M. 2012. Generalized sampling and variance in counterfactual regret minimization. In AAAI Conference on Artificial Intelligence, 1355–1361.
    Google ScholarLocate open access versionFindings
  • Hart, S., and Mas-Colell, A. 2000. A simple adaptive procedure leading to correlated equilibrium. Econometrica 68:1127–1150.
    Google ScholarLocate open access versionFindings
  • Hashimoto, J.; Kishimoto, A.; Yoshizoe, K.; and Ikeda, K. 2011. Accelerated UCT and its application to two-player games. In Advances in Computer Games, 1–12. Springer.
    Google ScholarLocate open access versionFindings
  • Heinrich, J.; Lanctot, M.; and Silver, D. 2015. Fictitious self-play in extensive-form games. In ICML, 805–813.
    Google ScholarLocate open access versionFindings
  • Hoda, S.; Gilpin, A.; Pena, J.; and Sandholm, T. 2010. Smoothing techniques for computing Nash equilibria of sequential games. Mathematics of Operations Research 35(2):494–512. Conference version appeared in WINE-07.
    Google ScholarLocate open access versionFindings
  • Jackson, E. 2017. Targeted CFR. In AAAI Workshop on Computer Poker and Imperfect Information.
    Google ScholarLocate open access versionFindings
  • Johanson, M.; Waugh, K.; Bowling, M.; and Zinkevich, M. 2011. Accelerating best response calculation in large extensive games. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 258–265.
    Google ScholarLocate open access versionFindings
  • Kroer, C.; Waugh, K.; Kılınc-Karzan, F.; and Sandholm, T. 2015. Faster first-order methods for extensive-form game solving. In Proceedings of the ACM Conference on Economics and Computation (EC), 817–834. ACM.
    Google ScholarLocate open access versionFindings
  • Lanctot, M.; Waugh, K.; Zinkevich, M.; and Bowling, M. 2009. Monte Carlo sampling for regret minimization in extensive games. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), 1078–1086.
    Google ScholarLocate open access versionFindings
  • Littlestone, N., and Warmuth, M. K. 1994. The weighted majority algorithm. Information and Computation 108(2):212–261.
    Google ScholarLocate open access versionFindings
  • Moravcık, M.; Schmid, M.; Burch, N.; Lisy, V.; Morrill, D.; Bard, N.; Davis, T.; Waugh, K.; Johanson, M.; and Bowling, M. 2017. Deepstack: Expert-level artificial intelligence in heads-up no-limit poker. Science.
    Google ScholarFindings
  • Nash, J. 1950. Equilibrium points in n-person games. Proceedings of the National Academy of Sciences 36:48–49.
    Google ScholarLocate open access versionFindings
  • Nesterov, Y. 2005. Excessive gap technique in nonsmooth convex minimization. SIAM Journal of Optimization 16(1):235–249.
    Google ScholarLocate open access versionFindings
  • Pays, F. 2014. An interior point approach to large games of incomplete information. In AAAI Computer Poker Workshop.
    Google ScholarLocate open access versionFindings
  • Syrgkanis, V.; Agarwal, A.; Luo, H.; and Schapire, R. E. 2015. Fast convergence of regularized learning in games. In Neural Information Processing Systems, 2989–2997.
    Google ScholarLocate open access versionFindings
  • Tammelin, O.; Burch, N.; Johanson, M.; and Bowling, M. 2015. Solving heads-up limit texas hold’em. In IJCAI.
    Google ScholarFindings
  • Tammelin, O. 2014. Solving large imperfect information games using cfr+. arXiv preprint arXiv:1407.5042.
    Findings
  • Waugh, K. 2009. Abstraction in large extensive games. Master’s thesis, University of Alberta.
    Google ScholarFindings
  • Zinkevich, M.; Johanson, M.; Bowling, M. H.; and Piccione, C. 2007. Regret minimization in games with incomplete information. In Neural Information Processing Systems (NIPS), 1729–1736.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Best Paper
Best Paper of AAAI, 2019
Tags
Comments