The NetHack Learning Environment

NIPS 2020, 2020.

被引用9|浏览342
EI
微博一下
We present quantitative results on a suite of tasks included in NetHack Learning Environment using a standard distributed Deep Reinforcement Learning baseline and a popular exploration method, before analyzing agent behavior qualitatively

摘要

Progress in Reinforcement Learning (RL) algorithms goes hand-in-hand with the development of challenging environments that test the limits of current methods. While existing RL environments are either sufficiently complex or based on fast simulation, they are rarely both. Here, we present the NetHack Learning Environment (NLE), a scalab...更多
0
ZH
下载 PDF 全文
引用
微博一下
简介
  • Recent advances in (Deep) Reinforcement Learning (RL) have been driven by the development of novel simulation environments, such as the Arcade Learning Environment (ALE) [9], StarCraft [64, 69], BabyAI [16], Obstacle Tower [38], Minecraft [37, 29, 35], and Procgen Benchmark [18].
  • Montezuma’s Revenge highlighted that methods performing well on other ALE tasks were not able to successfully learn in this sparse-reward environment
  • This sparked a long line of research on novel methods for exploration [e.g., 8, 66, 53] and learning from demonstrations [e.g., 31, 62, 6].
  • While Go-Explore is an impressive solution for Montezuma’s Revenge, it exploits the determinism of environment transitions, allowing it to memorize sequences of actions that lead to previously visited states from which the agent can continue to explore
重点内容
  • Recent advances in (Deep) Reinforcement Learning (RL) have been driven by the development of novel simulation environments, such as the Arcade Learning Environment (ALE) [9], StarCraft [64, 69], BabyAI [16], Obstacle Tower [38], Minecraft [37, 29, 35], and Procgen Benchmark [18]
  • We present the NetHack Learning Environment (NLE), a procedurally generated environment that strikes a balance between complexity and speed
  • We present quantitative results on a suite of tasks included in NLE using a standard distributed Deep RL baseline and a popular exploration method, before analyzing agent behavior qualitatively
  • We demonstrate that current state-of-the-art model-free RL serves as a sensible baseline, and we provide an in-depth analysis of learned agent behaviors
  • Proposed formulations of intrinsic motivation based on seeking novelty [8, 53, 13] or maximizing surprise [56, 12, 57] are likely insufficient to make progress on NetHack given that an agent will constantly find itself in novel states or observe unexpected environment dynamics
  • We believe the NetHack Learning Environment strikes an excellent balance between complexity and speed while encompassing a variety of challenges for the research community
结果
  • The authors present quantitative results on a suite of tasks included in NLE using a standard distributed Deep RL baseline and a popular exploration method, before analyzing agent behavior qualitatively.
  • The agent has to learn to reliably search for hidden passages and secret doors.
  • Often, this involves using the search action consecutively many times, sometimes even at many locations on the map.
  • Using RND exploration, the authors observe substantial gains in the success rate for the monk (+13.58pp), tourist (+6.52pp) and valkyrie (+16.34pp) roles, while lower results for wizard roles (−12.96pp)
结论
  • The NetHack Learning Environment is a fast, complex, procedurally generated environment for advancing research in RL.
  • NetHack provides interesting challenges for exploration methods given the extremely large number of possible states and wide variety of environment dynamics to discover.
  • The authors believe the NetHack Learning Environment strikes an excellent balance between complexity and speed while encompassing a variety of challenges for the research community
总结
  • Introduction:

    Recent advances in (Deep) Reinforcement Learning (RL) have been driven by the development of novel simulation environments, such as the Arcade Learning Environment (ALE) [9], StarCraft [64, 69], BabyAI [16], Obstacle Tower [38], Minecraft [37, 29, 35], and Procgen Benchmark [18].
  • Montezuma’s Revenge highlighted that methods performing well on other ALE tasks were not able to successfully learn in this sparse-reward environment
  • This sparked a long line of research on novel methods for exploration [e.g., 8, 66, 53] and learning from demonstrations [e.g., 31, 62, 6].
  • While Go-Explore is an impressive solution for Montezuma’s Revenge, it exploits the determinism of environment transitions, allowing it to memorize sequences of actions that lead to previously visited states from which the agent can continue to explore
  • Results:

    The authors present quantitative results on a suite of tasks included in NLE using a standard distributed Deep RL baseline and a popular exploration method, before analyzing agent behavior qualitatively.
  • The agent has to learn to reliably search for hidden passages and secret doors.
  • Often, this involves using the search action consecutively many times, sometimes even at many locations on the map.
  • Using RND exploration, the authors observe substantial gains in the success rate for the monk (+13.58pp), tourist (+6.52pp) and valkyrie (+16.34pp) roles, while lower results for wizard roles (−12.96pp)
  • Conclusion:

    The NetHack Learning Environment is a fast, complex, procedurally generated environment for advancing research in RL.
  • NetHack provides interesting challenges for exploration methods given the extremely large number of possible states and wide variety of environment dynamics to discover.
  • The authors believe the NetHack Learning Environment strikes an excellent balance between complexity and speed while encompassing a variety of challenges for the research community
表格
  • Table1: Command actions.8
  • Table2: Table 2
  • Table3: Compass direction actions
  • Table4: Comparison between NLE and popular environments when using their respective Python Gym interface. SPS stands for “environment steps per second”. All environments but ObstacleTowerEnv were run via gym with standard settings (and headless when possible), for 60 seconds
  • Table5: Metrics averaged over last 1000 episodes for each task
  • Table6: Top five of the last 1000 episodes in the score task
Download tables as Excel
相关工作
  • Progress in RL has historically been achieved both by algorithmic innovations as well as development of novel environments to train and evaluate agents. We review recent RL environments and delineate their strengths and weaknesses as testbeds for current methods and future research below.

    Recent Game-Based Environments: Retro video games have been a major catalyst for Deep RL research. ALE [9] provides a unified interface to Atari 2600 games, which enables testing of RL algorithms on high-dimensional visual observations quickly and cheaply, resulting in numerous Deep RL publications over the years [4]. The Gym Retro environment [51] expands the list of classic games, but focuses on evaluating visual generalization and transfer learning on a single game, Sonic The Hedgehog.

    Both StarCraft: BroodWar and StarCraft II have been successfully employed as RL environments [64, 69] for research on, for example, planning [52, 49], multi-agent systems [27, 63], imitation learning [70], and model-free reinforcement learning [70]. However, the complexity of these games creates a high entry barrier both in terms of computational resources required as well as intricate baseline models that require a high degree of domain knowledge to be extended.
基金
  • Nantas Nardelli is supported by EPSRC/MURI grant EP/N019474/1
引用论文
  • Pieter Abbeel and Andrew Y. Ng. Apprenticeship learning via inverse reinforcement learning. In ICML, 2004.
    Google ScholarLocate open access versionFindings
  • Aransentin Breggan Hampe Pellsson. SWAGGINZZZ. https://pellsson.github.io/, 2019. Accessed:2020-05-30.
    Findings
  • Brenna Argall, Sonia Chernova, Manuela M. Veloso, and Brett Browning. A survey of robot learning from demonstration. Robotics and Autonomous Systems, 57:469–483, 2009.
    Google ScholarLocate open access versionFindings
  • Kai Arulkumaran, Marc Peter Deisenroth, Miles Brundage, and Anil Anthony Bharath. Deep reinforcement learning: A brief survey. IEEE Signal Processing Magazine, 34(6):26–38, 2017.
    Google ScholarLocate open access versionFindings
  • Andrea Asperti, Daniele Cortesi, Carlo De Pieri, Gianmaria Pedrini, and Francesco Sovrano. Crawling in rogue’s dungeons with deep reinforcement techniques. IEEE Transactions on Games, 2019.
    Google ScholarLocate open access versionFindings
  • Yusuf Aytar, Tobias Pfaff, David Budden, Tom Le Paine, Ziyu Wang, and Nando de Freitas. Playing hard exploration games by watching youtube. In NeurIPS, 2018.
    Google ScholarLocate open access versionFindings
  • Charles Beattie, Joel Z. Leibo, Denis Teplyashin, Tom Ward, Marcus Wainwright, Heinrich Küttler, Andrew Lefrancq, Simon Green, Víctor Valdés, Amir Sadik, Julian Schrittwieser, Keith Anderson, Sarah York, Max Cant, Adam Cain, Adrian Bolton, Stephen Gaffney, Helen King, Demis Hassabis, Shane Legg, and Stig Petersen. Deepmind lab. CoRR, abs/1612.03801, 2016.
    Findings
  • Marc Bellemare, Sriram Srinivasan, Georg Ostrovski, Tom Schaul, David Saxton, and Remi Munos. Unifying count-based exploration and intrinsic motivation. In NeurIPS, 2016.
    Google ScholarLocate open access versionFindings
  • Marc G. Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 2013.
    Google ScholarLocate open access versionFindings
  • S. R. K. Branavan, David Silver, and Regina Barzilay. Learning to win by reading manuals in a monte-carlo framework. In ACL, 2011.
    Google ScholarLocate open access versionFindings
  • Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. OpenAI Gym. CoRR, abs/1606.01540, 2016.
    Findings
  • Yuri Burda, Harrison Edwards, Deepak Pathak, Amos J. Storkey, Trevor Darrell, and Alexei A. Efros. Large-scale study of curiosity-driven learning. In ICLR, 2019.
    Google ScholarLocate open access versionFindings
  • Yuri Burda, Harrison Edwards, Amos J. Storkey, and Oleg Klimov. Exploration by random network distillation. In ICLR, 2019.
    Google ScholarLocate open access versionFindings
  • Jonathan Campbell and Clark Verbrugge. Learning combat in NetHack. In AIIDE, 2017.
    Google ScholarLocate open access versionFindings
  • Jonathan Campbell and Clark Verbrugge. Exploration in NetHack with secret discovery. IEEE Transactions on Games, 2018.
    Google ScholarLocate open access versionFindings
  • Maxime Chevalier-Boisvert, Dzmitry Bahdanau, Salem Lahlou, Lucas Willems, Chitwan Saharia, Thien Huu Nguyen, and Yoshua Bengio. Babyai: A platform to study the sample efficiency of grounded language learning. In ICLR, 2018.
    Google ScholarLocate open access versionFindings
  • Maxime Chevalier-Boisvert, Lucas Willems, and Suman Pal. Minimalistic Gridworld Environment for OpenAI Gym. https://github.com/maximecb/gym-minigrid, 2018.
    Findings
  • Karl Cobbe, Christopher Hesse, Jacob Hilton, and John Schulman. Leveraging procedural generation to benchmark reinforcement learning. arXiv preprint arXiv:1912.01588, 2019.
    Findings
  • Karl Cobbe, Oleg Klimov, Christopher Hesse, Taehoon Kim, and John Schulman. Quantifying generalization in reinforcement learning. In ICML, 2019.
    Google ScholarLocate open access versionFindings
  • Dustin Dannenhauer, Michael W Floyd, Jonathan Decker, and David W Aha. Dungeon crawl stone soup as an evaluation domain for artificial intelligence. Workshop on Games and Simulations for Artificial Intelligence, AAAI, 2019.
    Google ScholarLocate open access versionFindings
  • Peter Dayan and Geoffrey E. Hinton. Feudal reinforcement learning. In NeurIPS, 1992.
    Google ScholarLocate open access versionFindings
  • Adrien Ecoffet, Joost Huizinga, Joel Lehman, Kenneth O. Stanley, and Jeff Clune. Go-Explore: A New Approach for Hard-exploration Problems. arXiv preprint arXiv:1901.10995, 2019.
    Findings
  • Adrien Ecoffet, Joost Huizinga, Joel Lehman, Kenneth O. Stanley, and Jeff Clune. First return then explore. CoRR, abs/2004.12919, 2020.
    Findings
  • Lasse Espeholt, Hubert Soyer, Rémi Munos, Karen Simonyan, Volodymyr Mnih, Tom Ward, Yotam Doron, Vlad Firoiu, Tim Harley, Iain Dunning, Shane Legg, and Koray Kavukcuoglu. IMPALA: scalable distributed deep-rl with importance weighted actor-learner architectures. In ICML, 2018.
    Google ScholarLocate open access versionFindings
  • Eva Myers. List of Nethack spoilers. https://sites.google.com/view/evasroguelikegamessite/list-of-nethack-spoilers, 2020. Accessed:2020-06-03.
    Findings
  • Juan Manuel Sanchez Fernandez. Reinforcement Learning for roguelike type games (eliteMod v0.9). https://kcir.pwr.edu.pl/~witold/aiarr/2009_projekty/elitmod/, 2009. Accessed:2020-01-19.
    Findings
  • Jakob Foerster, Nantas Nardelli, Gregory Farquhar, Triantafyllos Afouras, Philip H.S. Torr, Pushmeet Kohli, and Shimon Whiteson. Stabilising experience replay for deep multi-agent reinforcement learning. In ICML, 2017.
    Google ScholarLocate open access versionFindings
  • Javier Garcıa and Fernando Fernández. A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 2015.
    Google ScholarLocate open access versionFindings
  • William H. Guss, Cayden Codel, Katja Hofmann, Brandon Houghton, Noboru Kuno, Stephanie Milani, Sharada Mohanty, Diego Perez Liebana, Ruslan Salakhutdinov, Nicholay Topin, et al. The MineRL competition on sample efficient reinforcement learning using human priors. NeurIPS Competition Track, 2019.
    Google ScholarLocate open access versionFindings
  • Luke Harries, Sebastian Lee, Jaroslaw Rzepecki, Katja Hofmann, and Sam Devlin. Mazeexplorer: A customisable 3d benchmark for assessing generalisation in reinforcement learning. In IEEE Conference on Games, 2019.
    Google ScholarLocate open access versionFindings
  • Todd Hester, Matej Vecerík, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Dan Horgan, John Quan, Andrew Sendonaris, Ian Osband, Gabriel Dulac-Arnold, John Agapiou, Joel Z. Leibo, and Audrunas Gruslys. Deep q-learning from demonstrations. In AAAI, 2017.
    Google ScholarLocate open access versionFindings
  • Felix Hill, Andrew K. Lampinen, Rosalia Schneider, Stephen Clark, Matthew Botvinick, James L. McClelland, and Adam Santoro. Emergent systematic generalization in a situated agent. In ICLR, 2020.
    Google ScholarLocate open access versionFindings
  • Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural Computation, 9 (8):1735–1780, 1997. doi: 10.1162/neco.1997.9.8.1735.
    Locate open access versionFindings
  • International Roguelike Development Conference. Berlin Interpretation. http://www.roguebasin.com/index.php?title=Berlin_Interpretation, 2008. Accessed:202001-08.
    Findings
  • Yacine Jernite, Kavya Srinet, Jonathan Gray, and Arthur Szlam. Craftassist instruction parsing: Semantic parsing for a minecraft assistant. CoRR, abs/1905.01978, 2019.
    Findings
  • Yiding Jiang, Shixiang Gu, Kevin Murphy, and Chelsea Finn. Language as an abstraction for hierarchical deep reinforcement learning. In NeurIPS, 2019.
    Google ScholarLocate open access versionFindings
  • Matthew Johnson, Katja Hofmann, Tim Hutton, and David Bignell. The malmo platform for artificial intelligence experimentation. In IJCAI, 2016.
    Google ScholarLocate open access versionFindings
  • Arthur Juliani, Ahmed Khalifa, Vincent-Pierre Berges, Jonathan Harper, Ervin Teng, Hunter Henry, Adam Crespi, Julian Togelius, and Danny Lange. Obstacle Tower: A Generalization Challenge in Vision, Control, and Planning. In IJCAI, 2019.
    Google ScholarLocate open access versionFindings
  • Niels Justesen, Ruben Rodriguez Torrado, Philip Bontrager, Ahmed Khalifa, Julian Togelius, and Sebastian Risi. Illuminating generalization in deep reinforcement learning through procedural level generation. arXiv preprint arXiv:1806.10729, 2018.
    Findings
  • Leslie Pack Kaelbling, Michael L Littman, and Andrew W Moore. Reinforcement learning: A survey. Journal of artificial intelligence research, 1996.
    Google ScholarLocate open access versionFindings
  • Yuji Kanagawa and Tomoyuki Kaneko. Rogue-gym: A new challenge for generalization in reinforcement learning. In IEEE Conference on Games, 2019.
    Google ScholarLocate open access versionFindings
  • Michał Kempka, Marek Wydmuch, Grzegorz Runc, Jakub Toczek, and Wojciech Jaskowski. Vizdoom: A doom-based ai research platform for visual reinforcement learning. In IEEE Conference on Computational Intelligence and Games, 2016.
    Google ScholarLocate open access versionFindings
  • Kenneth Lorber. NetHack Home Page. https://nethack.org, 2020. Accessed:2020-05-30.
    Findings
  • Heinrich Küttler, Nantas Nardelli, Thibaut Lavril, Marco Selvatici, Viswanath Sivakumar, Tim Rocktäschel, and Edward Grefenstette. TorchBeast: A PyTorch Platform for Distributed RL. arXiv, abs/1910.03552, 2019.
    Findings
  • David Lopez-Paz and Marc’Aurelio Ranzato. Gradient episodic memory for continual learning. In NeurIPS, 2017.
    Google ScholarLocate open access versionFindings
  • Jelena Luketina, Nantas Nardelli, Gregory Farquhar, Jakob Foerster, Jacob Andreas, Edward Grefenstette, Shimon Whiteson, and Tim Rocktäschel. A survey of reinforcement learning informed by natural language. In IJCAI, 2019.
    Google ScholarLocate open access versionFindings
  • M. Drew Streib. Public NetHack server at alt.org (NAO). https://alt.org/nethack/, 2020. Accessed:2020-05-30.
    Findings
  • Daniel J. Mankowitz, Augustin Zídek, André Barreto, Dan Horgan, Matteo Hessel, John Quan, Junhyuk Oh, Hado van Hasselt, David Silver, and Tom Schaul. Unicorn: Continual learning with a universal, off-policy agent. arXiv, abs/1802.08294, 2018.
    Findings
  • Nantas Nardelli, Gabriel Synnaeve, Zeming Lin, Pushmeet Kohli, Philip H.S. Torr, and Nicolas Usunier. Value propagation networks. In ICLR, 2019.
    Google ScholarLocate open access versionFindings
  • NetHack Wiki. NetHackWiki. https://nethackwiki.com/, 2020. Accessed:2020-02-01.
    Findings
  • Alex Nichol, Vicki Pfau, Christopher Hesse, Oleg Klimov, and John Schulman. Gotta learn fast: A new benchmark for generalization in rl. arXiv preprint arXiv:1804.03720, 2018.
    Findings
  • Santiago Ontanón, Gabriel Synnaeve, Alberto Uriarte, Florian Richoux, David Churchill, and Mike Preuss. A survey of real-time strategy game ai research and competition in starcraft. IEEE Transactions on Computational Intelligence and AI in Games, 2013.
    Google ScholarLocate open access versionFindings
  • Georg Ostrovski, Marc G Bellemare, Aäron van den Oord, and Rémi Munos. Count-based exploration with neural density models. In ICML, 2017.
    Google ScholarLocate open access versionFindings
  • German I. Parisi, Ronald Kemker, Jose L. Part, Christopher Kanan, and Stefan Wermter. Continual lifelong learning with neural networks: A review. Neural Networks, 2019.
    Google ScholarLocate open access versionFindings
  • Ronald Parr and Stuart J. Russell. Reinforcement learning with hierarchies of machines. In NeurIPS, 1997.
    Google ScholarLocate open access versionFindings
  • Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, and Trevor Darrell. Curiosity-driven exploration by self-supervised prediction. In ICML, 2017.
    Google ScholarLocate open access versionFindings
  • Roberta Raileanu and Tim Rocktäschel. RIDE: Rewarding impact-driven exploration for procedurally-generated environments. In ICLR, 2020.
    Google ScholarLocate open access versionFindings
  • Eric S. Raymond. A Guide to the Mazes of Menace, 1987.
    Google ScholarFindings
  • Eric S. Raymond, Mike Stephenson, et al. A Guide to the Mazes of Menace: Guidebook for NetHack. NetHack DevTeam, 2020. URL http://www.nethack.org/download/3.6.5/nethack-365-Guidebook.pdf.
    Locate open access versionFindings
  • Sebastian Risi and Julian Togelius. Procedural content generation: From automatically generating game levels to increasing generality in machine learning. CoRR, abs/1911.13071, 2019.
    Findings
  • David Rolnick, Arun Ahuja, Jonathan Schwarz, Timothy Lillicrap, and Gregory Wayne. Experience replay for continual learning. In NeurIPS, 2019.
    Google ScholarLocate open access versionFindings
  • Tim Salimans and Richard Chen. Learning montezuma’s revenge from a single demonstration. CoRR, abs/1812.03381, 2018.
    Findings
  • Mikayel Samvelyan, Tabish Rashid, Christian Schroeder de Witt, Gregory Farquhar, Nantas Nardelli, Tim G.J. Rudner, Chia-Man Hung, Philip H.S. Torr, Jakob Foerster, and Shimon Whiteson. The StarCraft Multi-Agent Challenge. In AAMAS, 2019.
    Google ScholarLocate open access versionFindings
  • Gabriel Synnaeve, Nantas Nardelli, Alex Auvolat, Soumith Chintala, Timothée Lacroix, Zeming Lin, Florian Richoux, and Nicolas Usunier. TorchCraft: a Library for Machine Learning Research on Real-Time Strategy Games. arXiv preprint arXiv:1611.00625, 2016.
    Findings
  • TAEB. TAEB Documentation: Other Bots. https://taeb.github.io/bots.html, 2015. Accessed:2020-01-19.
    Findings
  • Haoran Tang, Rein Houthooft, Davis Foote, Adam Stooke, OpenAI Xi Chen, Yan Duan, John Schulman, Filip DeTurck, and Pieter Abbeel. # Exploration: A study of count-based exploration for deep reinforcement learning. In NeurIPS, 2017.
    Google ScholarLocate open access versionFindings
  • Matthew E. Taylor and Peter Stone. Transfer learning for reinforcement learning domains: A survey. Journal Machine Learning Research, 2009.
    Google ScholarLocate open access versionFindings
  • Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Manfred Otto Heess, Max Jaderberg, David Silver, and Koray Kavukcuoglu. Feudal networks for hierarchical reinforcement learning. In ICML, 2017.
    Google ScholarLocate open access versionFindings
  • Oriol Vinyals, Timo Ewalds, Sergey Bartunov, Petko Georgiev, Alexander Sasha Vezhnevets, Michelle Yeo, Alireza Makhzani, Heinrich Küttler, John Agapiou, Julian Schrittwieser, et al. StarCraft II: A New Challenge for Reinforcement Learning. arXiv preprint arXiv:1708.04782, 2017.
    Findings
  • Oriol Vinyals, Igor Babuschkin, Wojciech M Czarnecki, Michaël Mathieu, Andrew Dudzik, Junyoung Chung, David H Choi, Richard Powell, Timo Ewalds, Petko Georgiev, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 2019.
    Google ScholarLocate open access versionFindings
  • Chang Ye, Ahmed Khalifa, Philip Bontrager, and Julian Togelius. Rotation, translation, and cropping for zero-shot generalization. CoRR, abs/2001.09908, 2020.
    Findings
  • Victor Zhong, Tim Rocktäschel, and Edward Grefenstette. RTFM: Generalising to new environment dynamics via reading. In ICLR, 2020.
    Google ScholarLocate open access versionFindings
您的评分 :
0

 

标签
评论