# Complexity and Algorithms for Exploiting Quantal Opponents in Large Two-Player Games

Weibo:

Abstract:

Solution concepts of traditional game theory assume entirely rational players; therefore, their ability to exploit subrational opponents is limited. One type of subrationality that describes human behavior well is the quantal response. While there exist algorithms for computing solutions against quantal opponents, they either do not sca...More

Code:

Data:

Introduction

- Extensive-form games are a powerful model able to describe recreational games, such as poker, as well as real-world situations from physical or network security.
- The most common model of bounded rationality in humans is the quantal response (QR) model (McKelvey and Palfrey 1995, 1998).
- Multiple experiments identified it as a good predictor of human behavior in games (Yang, Ordonez, and Tambe 2012; Haile, Hortacsu, and Kosenok.

Highlights

- Extensive-form games are a powerful model able to describe recreational games, such as poker, as well as real-world situations from physical or network security
- Our contributions are: 1) We analyze the relationship and properties of two solution concepts with quantal opponents that naturally arise from Nash equilibrium (QNE) and Stackelberg equilibrium (QSE). 2) We prove that computing Quantal Nash Equilibrium (QNE) is PPAD-hard even in normal-form game (NFG), and computing Quantal Stackelberg Equilibrium (QSE) in extensive-form game (EFG) is NP-hard
- Even though our main focus is on extensive-form games, we study the concepts in normal-form games, which can be seen as their conceptually simpler special case
- We show that contrary to their fullyrational counterparts, QNE differs from QSE even in zerosum games
- We focus on QNE, and based on an empirical evaluation; we claim that regretminimization algorithms converge to QNE in both NFGs and EFGs
- We call this optimal strategy Quantal Stackelberg Equilibrium (QSE) and show that natural adaptations of existing algorithms do not lead to QSE, but rather to a different solution we call Quantal Nash Equilibrium (QNE)

Methods

- For all experiments except Goofspiel 7, the authors use Python 3.7.
- LP computations are done using gurobi 8.1.1, and experiments were done on Intel i7 1.8GHz CPU with 8GB RAM.
- Goofspiel experiment was run on 24 cores/48 threads 3.2GHz (2 x Intel Xeon Scalable Gold 6146) with 384GB of RAM, implemented in C++.
- The authors wanted to measure the scalability and performance of the proposed solutions and the baseline

Conclusion

- Bounded rationality models are crucial for applications that involve human decision-makers.
- Artificial intelligence applications in real-world problems pose a novel challenge of computing optimal strategies for an entirely rational system interacting with bounded-rational humans.
- The authors call this optimal strategy Quantal Stackelberg Equilibrium (QSE) and show that natural adaptations of existing algorithms do not lead to QSE, but rather to a different solution the authors call Quantal Nash Equilibrium (QNE).
- The authors propose a variant of counterfactual regret minimization which, based on the experimental evaluation, scales to large games, and computes strategies that outperform QNE against both the quantal response opponent and the perfectly rational opponent

Summary

## Introduction:

Extensive-form games are a powerful model able to describe recreational games, such as poker, as well as real-world situations from physical or network security.- The most common model of bounded rationality in humans is the quantal response (QR) model (McKelvey and Palfrey 1995, 1998).
- Multiple experiments identified it as a good predictor of human behavior in games (Yang, Ordonez, and Tambe 2012; Haile, Hortacsu, and Kosenok.
## Objectives:

This paper aims to analyze and propose scalable algorithms for computing effective and robust strategies against a quantal opponent in normal-form and extensive-form games.- The authors aim to find a parameter α of the combination that maximizes the utility against LQR.
- The authors aim to choose the parameter p such that it maximizes the expected payoff.
- The authors aim to show that RQR performs even better than an optimized NE.
- The authors aim to prove that for any σ and the corresponding vector of complementary probabilities of playing the second actions 1 − σ it holds that
## Methods:

For all experiments except Goofspiel 7, the authors use Python 3.7.- LP computations are done using gurobi 8.1.1, and experiments were done on Intel i7 1.8GHz CPU with 8GB RAM.
- Goofspiel experiment was run on 24 cores/48 threads 3.2GHz (2 x Intel Xeon Scalable Gold 6146) with 384GB of RAM, implemented in C++.
- The authors wanted to measure the scalability and performance of the proposed solutions and the baseline
## Conclusion:

Bounded rationality models are crucial for applications that involve human decision-makers.- Artificial intelligence applications in real-world problems pose a novel challenge of computing optimal strategies for an entirely rational system interacting with bounded-rational humans.
- The authors call this optimal strategy Quantal Stackelberg Equilibrium (QSE) and show that natural adaptations of existing algorithms do not lead to QSE, but rather to a different solution the authors call Quantal Nash Equilibrium (QNE).
- The authors propose a variant of counterfactual regret minimization which, based on the experimental evaluation, scales to large games, and computes strategies that outperform QNE against both the quantal response opponent and the perfectly rational opponent

Reference

- Bard, N.; Johanson, M.; Burch, N.; and Bowling, M. 2013. Online implicit agent modelling. In Proceedings of the 2013 international conference on Autonomous agents and multiagent systems, 255–262.
- Basak, A.; Cern, J.; Gutierrez, M.; Curtis, S.; Kamhoua, C.; Jones, D.; Bosansk, B.; and Kiekintveld, C. 2018. An initial study of targeted personality models in the flipit game. In International Conference on Decision and Game Theory for Security, 623–636. Springer.
- Boyd, S.; and Vandenberghe, L. 2004. Convex Optimization. Cambridge University Press.
- Brown, N.; and Sandholm, T. 2018. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science 359(6374): 418–424.
- Cerny, J.; Lis, V.; Boansk, B.; and An, B. 2020. DinkelbachType Algorithm for Computing Quantal Stackelberg Equilibrium. In Bessiere, C., ed., Proceedings of the TwentyNinth International Joint Conference on Artificial Intelligence, IJCAI-20, 246–253. Main track.
- Conitzer, V.; and Sandholm, T. 200Computing the optimal strategy to commit to. In Proceedings of the 7th ACM conference on Electronic commerce, 82–90.
- Daskalakis, C.; Goldberg, P. W.; and Papadimitriou, C. H. 2009. The complexity of computing a Nash equilibrium. SIAM Journal on Computing 39(1): 195–259.
- Davis, T.; Burch, N.; and Bowling, M. 2014. Using response functions to measure strategy strength. In Twenty-Eighth AAAI Conference on Artificial Intelligence.
- Delle Fave, F. M.; Jiang, A. X.; Yin, Z.; Zhang, C.; Tambe, M.; Kraus, S.; and Sullivan, J. P. 2014. Game-theoretic patrolling with dynamic execution uncertainty and a case study on a real transit system. Journal of Artificial Intelligence Research 50: 321–367.
- Fang, F.; Nguyen, T. H.; Pickles, R.; Lam, W. Y.; Clements, G. R.; An, B.; Singh, A.; Schwedock, B. C.; Tambe, M.; and Lemieux, A. 2017. PAWS-A Deployed Game-Theoretic Application to Combat Poaching. AI Magazine 38(1): 23– 36.
- Farina, G.; Kroer, C.; and Sandholm, T. 2019. Online convex optimization for sequential decision processes and extensive-form games. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, 1917–1925.
- Haile, P. A.; Hortacsu, A.; and Kosenok, G. 2008. On the empirical content of quantal response equilibrium. American Economic Review 98(1): 180–200.
- Hart, S.; and Mas-Colell, A. 2000. A simple adaptive procedure leading to correlated equilibrium. Econometrica 68(5): 1127–1150.
- Johanson, M.; and Bowling, M. 2009. Data biased robust counter strategies. In Artificial Intelligence and Statistics, 264–271.
- Johanson, M.; Zinkevich, M.; and Bowling, M. 2008. Computing robust counter-strategies. In Advances in neural information processing systems, 721–728.
- Letchford, J.; and Conitzer, V. 2010. Computing optimal strategies to commit to in extensive-form games. In Proceedings of the 11th ACM conference on Electronic commerce, 83–92.
- Lockhart, E.; Lanctot, M.; Prolat, J.; Lespiau, J.-B.; Morrill, D.; TImbers, F.; and Tuyls, K. 2019. Computing Approximate Equilibria in Sequential Adversarial Games by Exploitability Descent. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, 464–470.
- McFadden, D. L. 1976. Quantal choice analaysis: A survey. In Annals of Economic and Social Measurement, Volume 5, number 4, 363–390. NBER.
- McKelvey, R. D.; and Palfrey, T. R. 1995. Quantal response equilibria for normal form games. Games and economic behavior 10(1): 6–38.
- McKelvey, R. D.; and Palfrey, T. R. 1998. Quantal response equilibria for extensive form games. Experimental economics 1(1): 9–41.
- Moravk, M.; Schmid, M.; Burch, N.; Lis, V.; Morrill, D.; Bard, N.; Davis, T.; Waugh, K.; Johanson, M.; and Bowling, M. 2017. Deepstack: Expert-level artificial intelligence in Heads-Up No-Limit Poker. Science 356(6337): 508–513.
- Nudelman, E.; Wortman, J.; Shoham, Y.; and LeytonBrown, K. 2004. Run the GAMUT: A comprehensive approach to evaluating game-theoretic algorithms. In Proceedings of the 3rd International Joint Conference on Autonomous Agents and Multiagent Systems, volume 4, 880– 887.
- Pita, J.; Jain, M.; Marecki, J.; Ordonez, F.; Portway, C.; Tambe, M.; Western, C.; Paruchuri, P.; and Kraus, S. 2008. Deployed ARMOR protection: The application of a game theoretic model for security at the Los Angeles International Airport. In Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems, 125–132.
- Sandholm, T.; Gilpin, A.; and Conitzer, V. 2005. Mixedinteger programming methods for finding Nash equilibria. In AAAI, 495–501.
- Tambe, M. 2011. Security and Game Theory: Algorithms, Deployed Systems, Lessons Learned. New York, NY, USA: Cambridge University Press. ISBN 1107096421, 9781107096424.
- Turocy, T. L. 2005. A dynamic homotopy interpretation of the logistic quantal response equilibrium correspondence. Games and Economic Behavior 51(2): 243–263.
- Yang, R.; Ordonez, F.; and Tambe, M. 2012. Computing optimal strategy against quantal response in security games. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems-Volume 2, 847– 854.
- Zinkevich, M.; Johanson, M.; Bowling, M.; and Piccione, C. 2008. Regret minimization in games with incomplete information. In Advances in Neural Information Processing Systems, 1729–1736.
- Computing an -NASH in Gis PPAD-complete (Daskalakis, Goldberg, and Papadimitriou 2009). We show that computing QNE is PPAD-hard by reducing the problem of finding

Full Text

Tags

Comments