Self-Play Or Group Practice: Learning To Play Alternating Markov Game In Multi-Agent System

2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR)（2020）

引用 2|浏览3

暂无评分

摘要

The research in reinforcement learning has achieved great success in strategic game playing. These successes are thanks to the incorporation of deep reinforcement learning (DRL) and Monte Carlo Tree Search (MCTS) to the agent trained under the self-play (SP) environment. By self-play, agents are provided with an incrementally more difficult curriculum which in turn facilitates learning. However, recent research suggests that agents trained via self-play may easily lead to getting stuck in local equilibria. In this paper, we consider a population of agents each independently learns to play an alternating Markov game (AMG). We propose a new training framework-group practice- for a population of decentralized RL agents. By group practice (GP), agents are assigned into multiple learning groups during training, for every episode of games, an agent is randomly paired up and practices with another agent in the learning group. The convergence result to the optimal value function and the Nash equilibrium are proved under the GP framework. Experimental study is conducted by applying GP to Q-learning algorithm and the deep Q-learning with Monte-Carlo tree search on the game of Connect Four and the game of Hex. We verify that GP is the more efficient training scheme than SP given the same amount of training. We also show that the learning effectiveness can even be improved when applying local grouping to agents.

查看译文

关键词

play alternating Markov game,multiagent system,strategic game playing,deep reinforcement learning,Monte Carlo Tree Search,self-play environment,local equilibria,training framework-group practice,decentralized RL agents,multiple learning groups,learning group,GP framework,Q-learning algorithm,deep Q-learning,Monte-Carlo tree search,efficient training scheme,learning effectiveness,local grouping,Nash equilibrium

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要