Learning to Play No-Press Diplomacy with Best Response Policy Iteration

NIPS 2020, 2020.

Cited by: 3|Views175
EI
Weibo:
We showed that the stochasticity of Sampled Best Responses was beneficial to Best Response Policy Iteration algorithms in Blotto

Abstract:

Recent advances in deep reinforcement learning (RL) have led to considerable progress in many 2-player zero-sum games, such as Go, Poker and Starcraft. The purely adversarial nature of such games allows for conceptually simple and principled application of RL methods. However real-world settings are many-agent, and agent interactions ar...More

Code:

Data:

0
Your rating :
0

 

Tags
Comments