Learning to Play No-Press Diplomacy with Best Response Policy Iteration
NIPS 2020, 2020.
We showed that the stochasticity of Sampled Best Responses was beneficial to Best Response Policy Iteration algorithms in Blotto
Recent advances in deep reinforcement learning (RL) have led to considerable progress in many 2-player zero-sum games, such as Go, Poker and Starcraft. The purely adversarial nature of such games allows for conceptually simple and principled application of RL methods. However real-world settings are many-agent, and agent interactions ar...More
PPT (Upload PPT)