Mastering the game of Go with deep neural networks and tree search（Nature）
时间: 2019-01-04 04:05
关键词: AlphaGo ，深度神经网络，蒙特卡洛树搜索
本文介绍了围棋AI程序AlphaGo 使用的技术。围棋AI的挑战主要来自两方面：一方面是庞大的搜索空间；另一方面是围棋的局面和走棋难以评估。AlphaGo设计了value networks和policy networks两个深度神经网络分别用于评估局面和选择下一步的走棋位置。这两个深度网络采用监督学习和强化学习两种方式训练，并通过蒙特卡洛树搜索（Monte Carlo Tree Search, MCTS）将两者结合到一起。文章发表时，AlphaGo和其他围棋AI程序对弈能达到99.8%的胜率，并以5：0的比分击败了欧洲冠军。
he game of Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. Here we introduce a new approach to computer Go that uses ‘value networks’ to evaluate board positions and ‘policy networks’ to select moves. These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play. Without any lookahead search, the neural networks play Go at the level of stateof-the-art Monte Carlo tree search programs that simulate thousands of randomT games of self-play. We also introduce a new search algorithm that combines Monte Carlo simulation with value and policy networks. Using this search algorithm, our program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0. This is the first time that a computer program has defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away.