Neurohex: A Deep Q-learning Hex Agent.

arXiv: Artificial Intelligence(2016)

引用 30|浏览53
暂无评分
摘要
DeepMind’s recent spectacular success in using deep convolutional neural nets and machine learning to build superhuman level agents—e.g. for Atari games via deep Q-learning and for the game of Go via other deep Reinforcement Learning methods—raises many questions, including to what extent these methods will succeed in other domains. In this paper we consider DQL for the game of Hex: after supervised initializing, we use self-play to train NeuroHex, an 11-layer convolutional neural network that plays Hex on the 13 (times ) 13 board. Hex is the classic two-player alternate-turn stone placement game played on a rhombus of hexagonal cells in which the winner is whomever connects their two opposing sides. Despite the large action and state space, our system trains a Q-network capable of strong play with no search. After two weeks of Q-learning, NeuroHex achieves respective win-rates of 20.4% as first player and 2.1% as second player against a 1-s/move version of MoHex, the current ICGA Olympiad Hex champion. Our data suggests further improvement might be possible with more training time.
更多
查看译文
关键词
Optimal Policy,Reinforcement Learning,Gradient Descent,Convolutional Neural Network,Policy Network
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要