AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
We propose a MOBA AI learning paradigm that methodologically enables playing full MOBA games with deep reinforcement learning

Towards Playing Full MOBA Games with Deep Reinforcement Learning

NIPS 2020, (2020)

被引用0|浏览418
下载 PDF 全文
引用
微博一下

摘要

MOBA games, e.g., Honor of Kings, League of Legends, and Dota 2, pose grand challenges to AI systems such as multi-agent, enormous state-action space, complex action control, etc. Developing AI for playing MOBA games has raised much attention accordingly. However, existing work falls short in handling the raw game complexity caused by the...更多

代码

数据

0
简介
  • Due to its playing mechanics which involve multi-agent competition and cooperation, imperfect information, complex action control, and enormous state-action space, MOBA is considered as a preferable testbed for AI research [29, 25].
  • A MOBA game, such as Honor of Kings, even with significant discretization, could have a state and action space of magnitude 1020000 [36], while that of a conventional Game AI testbed, such as Go, is at most 10360 [30].
  • MOBA games are further complicated by the real-time strategies of multiple heroes (each hero is uniquely
重点内容
  • We propose a learning paradigm for supporting full MOBA game-playing with deep reinforcement learning
  • We develop an efficient and effective drafting agent based on Monte-Carlo tree search (MCTS) [7]
  • We developed a combination of novel and existing learning techniques, including off-policy adaption, multi-head value estimation, curriculum self-play learning, multiteacher policy distillation, and Monte-Carlo tree search, to deal with the emerging problems caused by training and playing a large pool of heroes
  • To the best of our knowledge, this is the first reinforcement learning based MOBA AI program that can play a pool of 40 heroes and more
结果
  • 4.2.1 AI Performance

    The authors train an AI for playing a pool of 40 heroes 4 in Honor of Kings, covering all hero roles.
  • A number of episodes and complete games played between AI and professionals are publicly available at: https://sourl.cn/NVwV6L, in which various aspects of the AI are shown, including long-term planning, macro-strategy, team cooperation, high-level turret pushing without minions, solo competition, counter strategy to enemy’s gank, etc.
  • Through these game videos, one can clearly see the strategies and micro controls mastered by the AI
结论
  • Conclusion and Future Work

    In this paper, the authors proposed a MOBA AI learning paradigm towards playing full MOBA games with deep reinforcement learning.
  • To the best of the knowledge, this is the first reinforcement learning based MOBA AI program that can play a pool of 40 heroes and more.
  • There has been no AI program for sophisticated strategy video games working with the large-scale, rigorous, and repeated performance testing.
  • The authors will continue to work on complete hero pool support, and to investigate more efficient training methods to further shorten the MOBA AI learning process.
  • To facilitate research on game intelligence, the authors will develop subtasks of MOBA-game-playing for the AI community
总结
  • Introduction:

    Due to its playing mechanics which involve multi-agent competition and cooperation, imperfect information, complex action control, and enormous state-action space, MOBA is considered as a preferable testbed for AI research [29, 25].
  • A MOBA game, such as Honor of Kings, even with significant discretization, could have a state and action space of magnitude 1020000 [36], while that of a conventional Game AI testbed, such as Go, is at most 10360 [30].
  • MOBA games are further complicated by the real-time strategies of multiple heroes (each hero is uniquely
  • Objectives:

    The major difference between the work and OpenAI Five is that the goal of this paper is to develop AI programs towards playing full MOBA games.
  • Results:

    4.2.1 AI Performance

    The authors train an AI for playing a pool of 40 heroes 4 in Honor of Kings, covering all hero roles.
  • A number of episodes and complete games played between AI and professionals are publicly available at: https://sourl.cn/NVwV6L, in which various aspects of the AI are shown, including long-term planning, macro-strategy, team cooperation, high-level turret pushing without minions, solo competition, counter strategy to enemy’s gank, etc.
  • Through these game videos, one can clearly see the strategies and micro controls mastered by the AI
  • Conclusion:

    Conclusion and Future Work

    In this paper, the authors proposed a MOBA AI learning paradigm towards playing full MOBA games with deep reinforcement learning.
  • To the best of the knowledge, this is the first reinforcement learning based MOBA AI program that can play a pool of 40 heroes and more.
  • There has been no AI program for sophisticated strategy video games working with the large-scale, rigorous, and repeated performance testing.
  • The authors will continue to work on complete hero pool support, and to investigate more efficient training methods to further shorten the MOBA AI learning process.
  • To facilitate research on game intelligence, the authors will develop subtasks of MOBA-game-playing for the AI community
表格
  • Table1: Comparing training time of CSPL and the baseline
Download tables as Excel
相关工作
  • Our work belongs to system-level AI development for strategy video game playing, so we mainly discuss representative works along this line, covering RTS and MOBA games.

    General RTS games StarCraft has been used as the testbed for Game AI research in RTS for many years. Methods adopted by existing studies include rule-based, supervised learning, reinforcement learning, and their combinations [23, 34]. For rule-based methods, a representative is SAIDA, the champion of StarCraft AI Competition 2018 (see https://github.com/TeamSAIDA/SAIDA). For learning-based methods, recently, AlphaStar combined supervised learning and multi-agent reinforcement learning and achieved the grandmaster level in playing StarCraft 2 [33]. Our value estimation (Section 3.2) shares similarity to AlphaStar’s by using invisible opponent’s information.
研究对象与分析
samples: 10
And the value network is trained using 100 million samples (containing 10 million lineups; each lineup has 10 samples because we pick 10 heroes for a completed lineup) generated from MCTS-based drafting strategies. The labels for the 10 samples in each lineup are the same, which is calculated using the win-rate predictor. To evaluate the trained AI’s performance, we deploy the AI model into Honor of Kings to play against top human players

引用论文
  • Y. Bengio, J. Louradour, R. Collobert, and J. Weston. Curriculum learning. In Proceedings of the 26th annual international conference on machine learning, pages 41–48, 2009.
    Google ScholarLocate open access versionFindings
  • C. Berner, G. Brockman, B. Chan, V. Cheung, P. Debiak, C. Dennison, D. Farhi, Q. Fischer, S. Hashme, C. Hesse, et al. Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680, 2019.
    Findings
  • N. Brown and T. Sandholm. Superhuman ai for heads-up no-limit poker: Libratus beats top professionals. Science, 359(6374):418–424, 2018.
    Google ScholarLocate open access versionFindings
  • L. Bu, R. Babu, B. De Schutter, et al. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 38(2):156–172, 2008. 6https://en.wikipedia.org/wiki/Honor_of_Kings 7Matches played by AI can be found from the Google Drive link: https://sourl.cn/NVwV6L
    Locate open access versionFindings
  • Z. Chen, T.-H. D. Nguyen, Y. Xu, C. Amato, S. Cooper, Y. Sun, and M. S. El-Nasr. The art of drafting: a team-oriented hero recommendation system for multiplayer online battle arena games. In Proceedings of the 12th ACM Conference on Recommender Systems, pages 200–208, 2018.
    Google ScholarLocate open access versionFindings
  • Z. Chen and D. Yi. The game imitation: Deep supervised convolutional networks for quick video game ai. arXiv preprint arXiv:1702.05663, 2017.
    Findings
  • [9] W. M. Czarnecki, R. Pascanu, S. Osindero, S. Jayakumar, G. Swirszcz, and M. Jaderberg. Distilling policy distillation. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 1331–1340, 2019.
    Google ScholarLocate open access versionFindings
  • [10] C. Eisenach, H. Yang, J. Liu, and H. Liu. Marginal policy gradients: A unified family of estimators for bounded action spaces with applications. In The Seventh International Conference on Learning Representations (ICLR 2019), 2019.
    Google ScholarLocate open access versionFindings
  • [11] L. Espeholt, R. Marinier, P. Stanczyk, K. Wang, and M. Michalski. Seed rl: Scalable and efficient deep-rl with accelerated central inference. arXiv preprint arXiv:1910.06591, 2019.
    Findings
  • [12] L. Espeholt, H. Soyer, R. Munos, K. Simonyan, V. Mnih, T. Ward, Y. Doron, V. Firoiu, T. Harley, I. Dunning, et al. Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures. arXiv preprint arXiv:1802.01561, 2018.
    Findings
  • [13] P. Hernandez-Leal, M. Kaisers, T. Baarslag, and E. M. de Cote. A survey of learning in multiagent environments: Dealing with non-stationarity. arXiv preprint arXiv:1707.09183, 2017.
    Findings
  • [14] S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735– 1780, 1997.
    Google ScholarLocate open access versionFindings
  • [15] M. Jaderberg, W. M. Czarnecki, I. Dunning, L. Marris, G. Lever, A. G. Castañeda, C. Beattie, N. C. Rabinowitz, A. S. Morcos, A. Ruderman, et al. Human-level performance in 3d multiplayer games with population-based reinforcement learning. Science, 364(6443):859–865, 2019.
    Google ScholarLocate open access versionFindings
  • [16] D. Jiang, E. Ekwedike, and H. Liu. Feedback-based tree search for reinforcement learning. In International Conference on Machine Learning, pages 2289–2298, 2018.
    Google ScholarLocate open access versionFindings
  • [17] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. In Y. Bengio and Y. LeCun, editors, 3rd International Conference on Learning Representations, 2015.
    Google ScholarLocate open access versionFindings
  • [18] D. E. Knuth and R. W. Moore. An analysis of alpha-beta pruning. Artificial intelligence, 6(4):293–326, 1975.
    Google ScholarLocate open access versionFindings
  • [19] L. Kocsis and C. Szepesvári. Bandit based monte-carlo planning. In European conference on machine learning, pages 282–293.
    Google ScholarLocate open access versionFindings
  • [20] V. R. Konda and J. N. Tsitsiklis. Actor-critic algorithms. In Advances in neural information processing systems, pages 1008–1014, 2000.
    Google ScholarLocate open access versionFindings
  • [21] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529, 2015.
    Google ScholarLocate open access versionFindings
  • [22] J. T. Morisette and S. Khorram. Exact binomial confidence interval for proportions. Photogrammetric engineering and remote sensing, 64(4):281–282, 1998.
    Google ScholarLocate open access versionFindings
  • [23] S. Ontanón, G. Synnaeve, A. Uriarte, F. Richoux, D. Churchill, and M. Preuss. A survey of real-time strategy game ai research and competition in starcraft. IEEE Transactions on Computational Intelligence and AI in games, 5(4):293–311, 2013. https://openai.com/blog/
    Locate open access versionFindings
  • openai-five-defeats-dota-2-world-champions/, 2019.
    Google ScholarFindings
  • [25] G. Robertson and I. Watson. A review of real-time strategy game ai. AI Magazine, 35(4):75–104, 2014.
    Google ScholarLocate open access versionFindings
  • [26] A. A. Rusu, S. G. Colmenarejo, C. Gulcehre, G. Desjardins, J. Kirkpatrick, R. Pascanu, V. Mnih, K. Kavukcuoglu, and R. Hadsell. Policy distillation. arXiv preprint arXiv:1511.06295, 2015.
    Findings
  • [27] J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel. High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438, 2015.
    Findings
  • [28] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms. arXiv:1707.06347, 2017.
    Findings
  • [29] V. d. N. Silva and L. Chaimowicz. Moba: a new arena for game ai. arXiv preprint arXiv:1705.10443, 2017.
    Findings
  • [30] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al. Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484–489, 2016.
    Google ScholarLocate open access versionFindings
  • [31] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, et al. Mastering the game of go without human knowledge. Nature, 550(7676):354, 2017.
    Google ScholarLocate open access versionFindings
  • [32] H. Van Seijen, M. Fatemi, J. Romoff, R. Laroche, T. Barnes, and J. Tsang. Hybrid reward architecture for reinforcement learning. In Advances in Neural Information Processing Systems, pages 5392–5402, 2017.
    Google ScholarLocate open access versionFindings
  • [33] O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev, et al. Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature, 575(7782):350–354, 2019.
    Google ScholarLocate open access versionFindings
  • [34] O. Vinyals, T. Ewalds, S. Bartunov, P. Georgiev, A. S. Vezhnevets, M. Yeo, A. Makhzani, H. Küttler, J. Agapiou, J. Schrittwieser, et al. Starcraft ii: A new challenge for reinforcement learning. arXiv preprint arXiv:1708.04782, 2017.
    Findings
  • [35] Q. Wang, J. Xiong, L. Han, H. Liu, and T. Zhang. Exponentially weighted imitation learning for batched historical data. In Advances in Neural Information Processing Systems (NIPS 2018), pages 6288–6297, 2018.
    Google ScholarLocate open access versionFindings
  • [36] B. Wu. Hierarchical macro strategy model for moba game ai. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 1206–1213, 2019.
    Google ScholarLocate open access versionFindings
  • [37] D. Ye, Z. Liu, M. Sun, B. Shi, P. Zhao, H. Wu, H. Yu, S. Yang, X. Wu, Q. Guo, et al. Mastering complex control in moba games with deep reinforcement learning. In AAAI, pages 6672–6679, 2020.
    Google ScholarLocate open access versionFindings
作者
Guibin Chen
Guibin Chen
Wen Zhang
Wen Zhang
chen sheng
chen sheng
Bo Yuan
Bo Yuan
Jia Chen
Jia Chen
Hongsheng Yu
Hongsheng Yu
您的评分 :
0

 

标签
评论
小科