DREAM: Deep Regret minimization with Advantage baselines and Model-free learning
arxiv(2020)
摘要
We introduce DREAM, a deep reinforcement learning algorithm that finds optimal strategies in imperfect-information games with multiple agents. Formally, DREAM converges to a Nash Equilibrium in two-player zero-sum games and to an extensive-form coarse correlated equilibrium in all other games. Our primary innovation is an effective algorithm that, in contrast to other regret-based deep learning algorithms, does not require access to a perfect simulator of the game to achieve good performance. We show that DREAM empirically achieves state-of-the-art performance among model-free algorithms in popular benchmark games, and is even competitive with algorithms that do use a perfect simulator.
更多查看译文
关键词
deep regret minimization,advantage baselines,learning,model-free
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络