AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
A variety of benchmarks have been released, such as the Arcade Learning Environment, which exposed a collection of Atari 2600 games as reinforcement learning problems, and recently the RLLab benchmark for continuous control, to which we refer the reader for a survey on other Rein...

OpenAI Gym.

CoRR, (2016)

Cited: 3630|Views84
EI
Full Text
Bibtex
Weibo

Abstract

OpenAI Gym is a toolkit for reinforcement learning research. It includes a growing collection of benchmark problems that expose a common interface, and a website where people can share their results and compare the performance of algorithms. This whitepaper discusses the components of OpenAI Gym and the design decisions that went into t...More

Code:

Data:

0
Introduction
  • Reinforcement learning (RL) is the branch of machine learning that is concerned with making sequences of decisions.
  • A variety of benchmarks have been released, such as the Arcade Learning Environment (ALE) [5], which exposed a collection of Atari 2600 games as reinforcement learning problems, and recently the RLLab benchmark for continuous control [6], to which the authors refer the reader for a survey on other RL benchmarks, including [7, 8, 9, 10, 11].
  • OpenAI Gym has a website where one can find scoreboards for all of the environments, showcasing results submitted by users.
Highlights
  • Reinforcement learning (RL) is the branch of machine learning that is concerned with making sequences of decisions
  • A variety of benchmarks have been released, such as the Arcade Learning Environment (ALE) [5], which exposed a collection of Atari 2600 games as reinforcement learning problems, and recently the RLLab benchmark for continuous control [6], to which we refer the reader for a survey on other Reinforcement learning benchmarks, including [7, 8, 9, 10, 11]
  • An Reinforcement learning algorithm seeks to maximize some measure of the agent’s total reward, as the agent interacts with the environment
  • OpenAI Gym focuses on the episodic setting of reinforcement learning, where the agent’s experience is broken down into a series of episodes
  • OpenAI Gym contains a collection of Environments (POMDPs), which will grow over time
Results
  • Reinforcement learning assumes that there is an agent that is situated in an environment.
  • The agent takes an action, and it receives an observation and reward from the environment.
  • An RL algorithm seeks to maximize some measure of the agent’s total reward, as the agent interacts with the environment.
  • The goal in episodic reinforcement learning is to maximize the expectation of total reward per episode, and to achieve a high level of performance in as few episodes as possible.
  • The design of OpenAI Gym is based on the authors’ experience developing and comparing reinforcement learning algorithms, and the experience using previous benchmark collections.
  • One could imagine an “online learning” style, where the agent takes as an input at each timestep and performs learning updates incrementally.
  • In an alternative “batch update” style, a agent is called with observation as input, and the reward information is collected separately by the RL algorithm, and later it is used to compute an update.
  • The performance of an RL algorithm on an environment can be measured along two axes: first, the final performance; second, the amount of time it takes to learn—the sample complexity.
  • Learning time can be measured in multiple ways, one simple scheme is to count the number of episodes before a threshold level of average performance is exceeded.
  • This threshold is chosen per-environment in an ad-hoc way, for example, as 90% of the maximum performance achievable by a very heavily trained agent.
  • The OpenAI Gym website allows users to compare the performance of their algorithms.
Conclusion
  • The aim of the OpenAI Gym scoreboards is not to create a competition, but rather to stimulate the sharing of code and ideas, and to be a meaningful benchmark for assessing different methods.
  • OpenAI Gym asks users to create a Writeup describing their algorithm, parameters used, and linking to code.
  • OpenAI Gym contains a collection of Environments (POMDPs), which will grow over time.
Reference
  • Dimitri P Bertsekas, Dimitri P Bertsekas, Dimitri P Bertsekas, and Dimitri P Bertsekas. Dynamic programming and optimal control. Athena Scientific Belmont, MA, 1995.
    Google ScholarFindings
  • V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, Sadik Beattie, C., Antonoglou A., H. I., King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
    Google ScholarLocate open access versionFindings
  • J. Schulman, S. Levine, P. Abbeel, M. I. Jordan, and P. Moritz. Trust region policy optimization. In ICML, pages 1889–1897, 2015.
    Google ScholarLocate open access versionFindings
  • Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy P Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. arXiv preprint arXiv:1602.01783, 2016.
    Findings
  • M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling. The Arcade Learning Environment: An evaluation platform for general agents. J. Artif. Intell. Res., 47:253–279, 2013.
    Google ScholarLocate open access versionFindings
  • Yan Duan, Xi Chen, Rein Houthooft, John Schulman, and Pieter Abbeel. Benchmarking deep reinforcement learning for continuous control. arXiv preprint arXiv:1604.06778, 2016.
    Findings
  • A. Geramifard, C. Dann, R. H. Klein, W. Dabney, and J. P. How. RLPy: A value-function-based reinforcement learning framework for education and research. J. Mach. Learn. Res., 16:1573–1578, 2015.
    Google ScholarLocate open access versionFindings
  • B. Tanner and A. White. RL-Glue: Language-independent software for reinforcement-learning experiments. J. Mach. Learn. Res., 10:2133–2136, 2009.
    Google ScholarLocate open access versionFindings
  • T. Schaul, J. Bayer, D. Wierstra, Y. Sun, M. Felder, F. Sehnke, T. Ruckstieß, and J. Schmidhuber. PyBrain. J. Mach. Learn. Res., 11:743–746, 2010.
    Google ScholarLocate open access versionFindings
  • S. Abeyruwan. RLLib: Lightweight standard and on/off policy reinforcement learning library (C++). http://web.cs.miami.edu/home/saminda/rilib.html, 2013.
    Findings
  • Christos Dimitrakakis, Guangliang Li, and Nikoalos Tziortziotis. The reinforcement learning competition 2014. AI Magazine, 35(3):61–65, 2014.
    Google ScholarLocate open access versionFindings
  • R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.
    Google ScholarFindings
  • Petr Baudisand Jean-loup Gailly. Pachi: State of the art open source go program. In Advances in Computer
    Google ScholarLocate open access versionFindings
  • Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics engine for model-based control. In Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on, pages 5026–5033. IEEE, 2012.
    Google ScholarLocate open access versionFindings
  • Michał Kempka, Marek Wydmuch, Grzegorz Runc, Jakub Toczek, and Wojciech Jaskowski. Vizdoom: A doom-based ai research platform for visual reinforcement learning. arXiv preprint arXiv:1605.02097, 2016.
    Findings
Author
Greg Brockman
Greg Brockman
Vicki Cheung
Vicki Cheung
Ludwig Pettersson
Ludwig Pettersson
Jonas Schneider
Jonas Schneider
John Schulman
John Schulman
Jie Tang
Jie Tang
0
Your rating :

No Ratings

Tags
Comments
avatar
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn