# Computing Ex Ante Coordinated Team-Maxmin Equilibria in Zero-Sum Multiplayer Extensive-Form Games

Weibo:

Abstract:

Developing efficient algorithms to compute equilibria in multiplayer games remains an open challenge in computational game theory. In this paper, we focus on computing Team-Maxmin Equilibria with Coordination device (TMEsCor) in zero-sum multiplayer extensive-form games, where a team of players with ex ante coordination play against an ...More

Code:

Data:

Introduction

- One important problem in artificial intelligence is to design algorithms for agents to make complex decisions in an interactive environment (Russell and Norvig 2016).
- Researchers have made many efforts in the non-cooperative environment for two-player games, e.g., finding a Nash Equilibrium (NE) (Nash 1951; Von Stengel 1996; Zinkevich et al 2008) and finding a Stackelberg equilibrium (Conitzer and Sandholm 2006).
- NEs are not unique in multiplayer games, which creates a barrier for each player to independently choose a strategy and form an NE with strategies of other players

Highlights

- One important problem in artificial intelligence is to design algorithms for agents to make complex decisions in an interactive environment (Russell and Norvig 2016)
- This paper shows that, to compute equilibria, the approach formulating the problem as a multilinear program with global optimization techniques can be dramatically faster than the approach immediately formulating it as a linear program
- Our experiments show that the actual number of iterations where CMB terminates is significantly smaller than the bound in Theorem 2, which could be due to the fact that at least one TMECor has a small support set shown in Theorem 3
- Multilinear Representation To transform Problem (5) to an Mixed-Integer Linear Program (MILP), based on the property of integer variables in Problem (5), we develop our technique, Multilinear Representation (MR), exactly representing multilinear terms without introducing new integer variables
- Our experimental results show that it is almost impossible for Fictitious Team Play (FTP) to converge to an ǫ∆u-TMECor with very small ǫ (e.g., ǫ = 0.0001), let alone an exact TMECor
- We propose an efficient algorithm to compute TMEsCor for zero-sum multiplayer Extensive-Form Game (EFG)

Results

- TME discussed in Section 3
- It has been shown (Farina et al 2018) that the team will lose a large utility if playing the strategy in TMEs instead of the one TMEsCor in games 3K3–3K7.
- The authors show runtimes to compute TMEs and TMEsCor and the team’s utilities in TMEs and TMEsCor in 3K8–3K12 in Table 6.
- The authors can see that the authors need dramatically more time to compute a TME than to compute a TMECor, where the team’s utility in a TME is significantly lower than the one in a TMECor

Conclusion

**Conclusion and Future Work**

In this paper, the authors propose an efficient algorithm to compute TMEsCor for zero-sum multiplayer EFGs.- Can explore the tighter theoretical upper bound for the number of iterations because CMB requires very few iterations in experiments.
- Due to the difficulty to compute BRO, the authors can explore the reinforcement learning approach (Timbers et al 2020) with the ART to speed up.
- To avoid BRO, the authors can explore the counterfactual regret minimization approach (Celli et al 2019) with the ART.

Summary

## Introduction:

One important problem in artificial intelligence is to design algorithms for agents to make complex decisions in an interactive environment (Russell and Norvig 2016).- Researchers have made many efforts in the non-cooperative environment for two-player games, e.g., finding a Nash Equilibrium (NE) (Nash 1951; Von Stengel 1996; Zinkevich et al 2008) and finding a Stackelberg equilibrium (Conitzer and Sandholm 2006).
- NEs are not unique in multiplayer games, which creates a barrier for each player to independently choose a strategy and form an NE with strategies of other players
## Objectives:

The authors aim to reduce this feasible solution space of w(σT ) and solve the MILP efficiently (For simplicity, the authors just say that the authors reduce the feasible space of an MILP).## Results:

TME discussed in Section 3- It has been shown (Farina et al 2018) that the team will lose a large utility if playing the strategy in TMEs instead of the one TMEsCor in games 3K3–3K7.
- The authors show runtimes to compute TMEs and TMEsCor and the team’s utilities in TMEs and TMEsCor in 3K8–3K12 in Table 6.
- The authors can see that the authors need dramatically more time to compute a TME than to compute a TMECor, where the team’s utility in a TME is significantly lower than the one in a TMECor
## Conclusion:

**Conclusion and Future Work**

In this paper, the authors propose an efficient algorithm to compute TMEsCor for zero-sum multiplayer EFGs.- Can explore the tighter theoretical upper bound for the number of iterations because CMB requires very few iterations in experiments.
- Due to the difficulty to compute BRO, the authors can explore the reinforcement learning approach (Timbers et al 2020) with the ART to speed up.
- To avoid BRO, the authors can explore the counterfactual regret minimization approach (Celli et al 2019) with the ART.

- Table1: In addition to EFG (N, A, H, L, χ, ρ, u, I), some notations are used in Section 4
- Table2: Computing TMEsCor: From the top to the bottom, these games are harder and harder to be solved. ‘> nh’ means that we terminate algorithms after n hours, which also represents that they do not converge within n hours for the remaining cases. To solve large games in our machine, 3L5∗ has five cards, and team players do not take action ‘raising’ in 4L31 (6 cards) and 4L32
- Table3: Computing TMEsCor. Here, team players choose actions in information sets reaching by sequence ∅ and then take action ’calling’ in other information sets
- Table4: The number of iterations when CMB converges and the size of the support set of the team’s strategy in the corresponding TMECor
- Table5: Computing ǫ∆u-TMEsCor: Games from top to bottom are 3K4, 3K8, 3K12, 3L3, 3L4, and 3L5, respectively, ∆u = 6 for 3Kr and ∆u = 21 for 3Lr (|L| ≈ 106 with |Σi| = 801 for 3L4, and |L| ≈ 3 · 106 with |Σi| = 1241 for 3L5). ‘> nh’ means that FTP does not converge within n hours for the remaining cases
- Table6: The runtimes to compute TMEs and TMEsCor and the team’s utilities of the team in TMEs and TMEsCor
- Table7: Comparing with existing approaches to compute TMEsCor in terms of the new strategy representation, the number of integer variables (|L| is significantly larger than i∈T \{1} |Σi| in extensive-form games) in BRO, the compatibility of BRO with associated constraints, and reducing the feasible solution space of BRO

Related work

- The Team-Maxmin

Equilibrium (TME)

(von Stengel and Koller 1997; Celli and Gatti 2018) is a solution concept close to the TMECor, where a team of players with the same utility function independently play against an adversary. The TME in extensive-form games assumes each team member takes behavioral strategies, which comes at a high cost compared to the TMECor (Celli and Gatti 2018; Farina et al 2018). Firstly, computing a TME, i.e., finding the optimal joint behavioral strategies of team members, is FNP-hard and a non-convex optimization problem (Celli and Gatti 2018). Secondly, the team may lose a large utility if they use the TME instead of the TMECor (Celli and Gatti 2018; Farina et al 2018).

Moreover, the team as a whole has imperfect recall due to the lack of communication between team members during the game, where behavioral strategies cannot capture the correlation between the normal-form strategies of team members (Farina et al 2018). That is, generally, behavioral strategies are not realization-equivalent to normal-form strategies induced by the coordination device, and then can cause a large loss of utilities to the team (Farina et al 2018).

Reference

- Abou Risk, N.; Szafron, D.; et al. 2010. Using counterfactual regret minimization to create competitive multiplayer poker agents. In AAMAS, 159–166.
- Bosansky, B.; Kiekintveld, C.; Lisy, V.; and Pechoucek, M. 2014. An exact double-oracle algorithm for zero-sum extensive-form games with imperfect information. Journal of Artificial Intelligence Research 51: 829–866.
- Brown, N.; and Sandholm, T. 2018. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science 359(6374): 418–424.
- Brown, N.; and Sandholm, T. 2019. Superhuman AI for multiplayer poker. Science 365(6456): 885–890. doi:10.1126/ science.aay2400.
- Cai, Y.; and Daskalakis, C. 2011. On minmax theorems for multiplayer games. In SODA, 217–234.
- Celli, A.; and Gatti, N. 2018. Computational results for extensive-form adversarial team games. In AAAI, 965–972.
- Celli, A.; Marchesi, A.; Bianchi, T.; and Gatti, N. 2019. Learning to Correlate in Multi-Player General-Sum Sequential Games. In NeurIPS, 13055–13065.
- Chen, X.; and Deng, X. 2005. 3-Nash is PPAD-complete. In Electronic Colloquium on Computational Complexity, volume 134, 2–29.
- Conitzer, V.; and Sandholm, T. 2006. Computing the optimal strategy to commit to. In EC, 82–90.
- Farina, G.; Celli, A.; Gatti, N.; and Sandholm, T. 2018. Ex ante coordination and collusion in zero-sum multi-player extensive-form games. In NeurIPS, 9638–9648.
- McCarthy, S. M.; Tambe, M.; Kiekintveld, C.; Gore, M. L.; and Killion, A. 2016. Preventing Illegal Logging: Simultaneous Optimization of Resource Teams and Tactics for Security. In AAAI, 3880–3886.
- McCormick, G. P. 1976. Computability of global solutions to factorable nonconvex programs: Part I–Convex underestimating problems. Mathematical Programming 10(1): 147– 175.
- McMahan, H. B.; Gordon, G. J.; and Blum, A. 2003. Planning in the presence of cost functions controlled by an adversary. In ICML, 536–543.
- Morrison, D. R.; Jacobson, S. H.; Sauppe, J. J.; and Sewell, E. C. 2016. Branch-and-bound algorithms: A survey of recent advances in searching, branching, and pruning. Discrete Optimization 19: 79–102.
- Nash, J. 1951. Non-cooperative games. Annals of Mathematics 286–295.
- Russell, S. J.; and Norvig, P. 20Artificial Intelligence: A Modern Approach. Malaysia; Pearson Education Limited.
- Ryoo, H. S.; and Sahinidis, N. V. 2001. Analysis of bounds for multilinear functions. Journal of Global Optimization 19(4): 403–424.
- Shoham, Y.; and Leyton-Brown, K. 2008. Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations. Cambridge University Press.
- Sinha, A.; Fang, F.; An, B.; Kiekintveld, C.; and Tambe, M. 2018. Stackelberg Security Games: Looking Beyond a Decade of Success. In IJCAI, 5494–5501.
- Timbers, F.; Lockhart, E.; Schmid, M.; Lanctot, M.; and Bowling, M. 20Approximate exploitability: Learning a best response in large games. arXiv preprint arXiv:2004.09677.
- Von Stengel, B. 1996. Efficient computation of behavior strategies. Games and Economic Behavior 14(2): 220–246.
- von Stengel, B.; and Koller, D. 1997. Team-maxmin equilibria. Games and Economic Behavior 21(1-2): 309–321.
- Wichardt, P. C. 2008. Existence of Nash equilibria in finite extensive form games with imperfect recall: A counterexample. Games and Economic Behavior 63(1): 366–369.
- Zhang, Y.; and An, B. 2020. Computing team-maxmin equilibria in zero-sum multiplayer extensive-form games. In AAAI.
- Zinkevich, M.; Johanson, M.; Bowling, M.; and Piccione, C. 2008. Regret minimization in games with incomplete information. In NeurIPS, 1729–1736.

Full Text

Tags

Comments