AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We find that for a number of the tests, agents pretrained in hide-and-seek learn faster or achieve higher final performance than agents trained from scratch or pretrained with intrinsic motivation; we find that the performance differences are not drastic, indicating that much of ...

Emergent Tool Use From Multi-Agent Autocurricula

ICLR, (2020)

Cited by: 79|Views753
EI
Full Text
Bibtex
Weibo

Abstract

Through multi-agent competition, the simple objective of hide-and-seek, and standard reinforcement learning algorithms at scale, we find that agents create a self-supervised autocurriculum inducing multiple distinct rounds of emergent strategy, many of which require sophisticated tool use and coordination. We find clear evidence of six em...More
Introduction
  • Creating intelligent artificial agents that can solve a wide variety of complex human-relevant tasks has been a long-standing challenge in the artificial intelligence community.
  • One approach to creating these agents is to explicitly specify desired tasks and train a reinforcement learning (RL) agent to solve them
  • On this front, there has been much recent progress in solving physically grounded tasks, e.g. dexterous in-hand manipulation (Rajeswaran et al, 2017; Andrychowicz et al, 2018) or locomotion of complex bodies (Schulman et al, 2015; Heess et al, 2017).
  • The learned skills in these single-agent RL settings are inherently bounded by the task description; once the agent has learned to solve the task, there is little room to improve
Highlights
  • Creating intelligent artificial agents that can solve a wide variety of complex human-relevant tasks has been a long-standing challenge in the artificial intelligence community
  • We observe signs of dynamic and growing complexity resulting from multi-agent competition and standard reinforcement learning algorithms; we find that agents go through as many as six distinct adaptations of strategy and counter-strategy, which are depicted in Figure 1
  • We find that for a number of the tests, agents pretrained in hide-and-seek learn faster or achieve higher final performance than agents trained from scratch or pretrained with intrinsic motivation; we find that the performance differences are not drastic, indicating that much of the skill and feature representations learned in hide-and-seek are entangled and hard to fine-tune
  • We conjecture that tasks where hide-and-seek pretraining outperforms the baseline are due to reuse of learned feature representations, whereas better-than-baseline transfer on the remaining tasks would require reuse of learned skills, which is much more difficult. This evaluation metric highlights the need for developing techniques to reuse skills effectively from a policy trained in one environment to another
  • We have demonstrated that simple game rules, multi-agent competition, and standard reinforcement learning algorithms at scale can induce agents to learn complex strategies and skills
  • Our results with hide-and-seek should be viewed as a proof of concept showing that multi-agent autocurricula can lead to physically grounded and human-relevant behavior
Results
  • In Figure 6 the authors show the performance on the suite of tasks for the hide-and-seek, countbased, and trained from scratch policies across 3 seeds.
  • The authors conjecture that tasks where hide-and-seek pretraining outperforms the baseline are due to reuse of learned feature representations, whereas better-than-baseline transfer on the remaining tasks would require reuse of learned skills, which is much more difficult.
  • This evaluation metric highlights the need for developing techniques to reuse skills effectively from a policy trained in one environment to another.
  • As future environments become more diverse and agents must use skills in more contexts, the authors may see more generalizable skill representations and more significant signal in this evaluation approach
Conclusion
  • The authors have demonstrated that simple game rules, multi-agent competition, and standard reinforcement learning algorithms at scale can induce agents to learn complex strategies and skills.
  • The authors observed emergence of as many as six distinct rounds of strategy and counter-strategy, suggesting that multiagent self-play with simple game rules in sufficiently complex environments could lead to openended growth in complexity.
  • In order to support further research in multi-agent autocurricula, the authors are open-sourcing the environment code
Summary
  • Introduction:

    Creating intelligent artificial agents that can solve a wide variety of complex human-relevant tasks has been a long-standing challenge in the artificial intelligence community.
  • One approach to creating these agents is to explicitly specify desired tasks and train a reinforcement learning (RL) agent to solve them
  • On this front, there has been much recent progress in solving physically grounded tasks, e.g. dexterous in-hand manipulation (Rajeswaran et al, 2017; Andrychowicz et al, 2018) or locomotion of complex bodies (Schulman et al, 2015; Heess et al, 2017).
  • The learned skills in these single-agent RL settings are inherently bounded by the task description; once the agent has learned to solve the task, there is little room to improve
  • Results:

    In Figure 6 the authors show the performance on the suite of tasks for the hide-and-seek, countbased, and trained from scratch policies across 3 seeds.
  • The authors conjecture that tasks where hide-and-seek pretraining outperforms the baseline are due to reuse of learned feature representations, whereas better-than-baseline transfer on the remaining tasks would require reuse of learned skills, which is much more difficult.
  • This evaluation metric highlights the need for developing techniques to reuse skills effectively from a policy trained in one environment to another.
  • As future environments become more diverse and agents must use skills in more contexts, the authors may see more generalizable skill representations and more significant signal in this evaluation approach
  • Conclusion:

    The authors have demonstrated that simple game rules, multi-agent competition, and standard reinforcement learning algorithms at scale can induce agents to learn complex strategies and skills.
  • The authors observed emergence of as many as six distinct rounds of strategy and counter-strategy, suggesting that multiagent self-play with simple game rules in sufficiently complex environments could lead to openended growth in complexity.
  • In order to support further research in multi-agent autocurricula, the authors are open-sourcing the environment code
Related work
Funding
  • Finds that agents create a selfsupervised autocurriculum inducing multiple distinct rounds of emergent strategy, many of which require sophisticated tool use and coordination
  • Finds clear evidence of six emergent phases in agent strategy in our environment, each of which creates a new pressure for the opposing team to adapt; for instance, agents learn to build multi-object shelters using moveable boxes which in turn leads to agents discovering that they can overcome obstacles using ramps
  • Introduces a new mixed competitive and cooperative physics-based environment in which agents compete in a simple game of hide-and-seek
  • Presents evidence that multi-agent co-adaptation may scale better with environment complexity and qualitatively centers around more human-interpretable behavior than intrinsically motivated agents
Reference
  • Joshua Achiam and Shankar Sastry. Surprise-based intrinsic motivation for deep reinforcement learning. arXiv preprint arXiv:1703.01732, 2017.
    Findings
  • Kelsey R Allen, Kevin A Smith, and Joshua B Tenenbaum. The tools challenge: Rapid trial-anderror learning in physical problem solving. arXiv preprint arXiv:1907.09620, 2019.
    Findings
  • Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mane. Concrete problems in AI safety. arXiv preprint arXiv:1606.06565, 2016.
    Findings
  • Marcin Andrychowicz, Bowen Baker, Maciek Chociej, Rafal Jozefowicz, Bob McGrew, Jakub Pachocki, Arthur Petron, Matthias Plappert, Glenn Powell, Alex Ray, et al. Learning dexterous in-hand manipulation. arXiv preprint arXiv:1808.00177, 2018.
    Findings
  • Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
    Findings
  • Rene Baillargeon and Susan Carey. Core cognition and beyond: The acquisition of physical and numerical knowledge. In In S. Pauen (Ed.), Early Childhood Development and Later Outcome, pp. 33–65. University Press, 2012.
    Google ScholarLocate open access versionFindings
  • Trapit Bansal, Jakub Pachocki, Szymon Sidor, Ilya Sutskever, and Igor Mordatch. Emergent complexity via multi-agent competition. In International Conference on Learning Representations, 2018.
    Google ScholarLocate open access versionFindings
  • Victor Bapst, Alvaro Sanchez-Gonzalez, Carl Doersch, Kimberly Stachenfeld, Pushmeet Kohli, Peter Battaglia, and Jessica Hamrick. Structured agents for physical construction. In International Conference on Machine Learning, pp. 464–474, 2019.
    Google ScholarLocate open access versionFindings
  • Marc Bellemare, Sriram Srinivasan, Georg Ostrovski, Tom Schaul, David Saxton, and Remi Munos. Unifying count-based exploration and intrinsic motivation. In Advances in Neural Information Processing Systems, pp. 1471–1479, 2016.
    Google ScholarLocate open access versionFindings
  • Yuri Burda, Harri Edwards, Deepak Pathak, Amos Storkey, Trevor Darrell, and Alexei A. Efros. Large-scale study of curiosity-driven learning. In International Conference on Learning Representations, 2019a.
    Google ScholarLocate open access versionFindings
  • Yuri Burda, Harrison Edwards, Amos Storkey, and Oleg Klimov. Exploration by random network distillation. In International Conference on Learning Representations, 2019b.
    Google ScholarLocate open access versionFindings
  • AD Channon, RI Damper, et al. Perpetuating evolutionary emergence. FROM ANIMALS TO ANIMATS 5, pp. 534–539, 1998.
    Google ScholarLocate open access versionFindings
  • Nuttapong Chentanez, Andrew G Barto, and Satinder P Singh. Intrinsically motivated reinforcement learning. In Advances in neural information processing systems, pp. 1281–1288, 2005.
    Google ScholarLocate open access versionFindings
  • Richard Dawkins and John Richard Krebs. Arms races between and within species. Proceedings of the Royal Society of London. Series B. Biological Sciences, 205(1161):489–511, 1979.
    Google ScholarLocate open access versionFindings
  • Carlos Diuk, Andre Cohen, and Michael L Littman. An object-oriented representation for efficient reinforcement learning. In Proceedings of the 25th international conference on Machine learning, pp. 240–247. ACM, 2008.
    Google ScholarLocate open access versionFindings
  • Yan Duan, Marcin Andrychowicz, Bradly Stadie, Jonathan Ho, Jonas Schneider, Ilya Sutskever, Pieter Abbeel, and Wojciech Zaremba. One-shot imitation learning. In Advances in neural information processing systems, pp. 1087–1098, 2017.
    Google ScholarLocate open access versionFindings
  • Saso Dzeroski, Luc De Raedt, and Kurt Driessens. Relational reinforcement learning. Machine learning, 43(1-2):7–52, 2001.
    Google ScholarLocate open access versionFindings
  • A.E. Elo. The rating of chessplayers, past and present. Arco Pub., 1978. ISBN 9780668047210. URL https://books.google.com/books?id=8pMnAQAAMAAJ.
    Locate open access versionFindings
  • Carlos Florensa, David Held, Markus Wulfmeier, Michael Zhang, and Pieter Abbeel. Reverse curriculum generation for reinforcement learning. In Conference on Robot Learning, pp. 482–495, 2017.
    Google ScholarLocate open access versionFindings
  • Jakob Foerster, Ioannis Alexandros Assael, Nando de Freitas, and Shimon Whiteson. Learning to communicate with deep multi-agent reinforcement learning. In Advances in Neural Information Processing Systems, pp. 2137–2145, 2016.
    Google ScholarLocate open access versionFindings
  • Jakob N Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, and Shimon Whiteson. Counterfactual multi-agent policy gradients. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
    Google ScholarLocate open access versionFindings
  • Sebastien Forestier, Yoan Mollard, and Pierre-Yves Oudeyer. Intrinsically motivated goal exploration processes with automatic curriculum learning. arXiv preprint arXiv:1708.02190, 2017.
    Findings
  • Nick Haber, Damian Mrowca, Stephanie Wang, Li F Fei-Fei, and Daniel L Yamins. Learning to play with intrinsically-motivated, self-aware agents. In Advances in Neural Information Processing Systems, pp. 8388–8399, 2018.
    Google ScholarLocate open access versionFindings
  • Nicolas Heess, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, SM Eslami, Martin Riedmiller, et al. Emergence of locomotion behaviours in rich environments. arXiv preprint arXiv:1707.02286, 2017.
    Findings
  • Ralf Herbrich, Tom Minka, and Thore Graepel. Trueskill: a bayesian skill rating system. In Advances in neural information processing systems, pp. 569–576, 2007.
    Google ScholarLocate open access versionFindings
  • Sepp Hochreiter and Jurgen Schmidhuber. Long short-term memory. Neural computation, 9(8): 1735–1780, 1997.
    Google ScholarLocate open access versionFindings
  • Rein Houthooft, Xi Chen, Yan Duan, John Schulman, Filip De Turck, and Pieter Abbeel. Vime: Variational information maximizing exploration. In Advances in Neural Information Processing Systems, pp. 1109–1117, 2016.
    Google ScholarLocate open access versionFindings
  • Gavin R Hunt. Manufacture and use of hook-tools by new caledonian crows. Nature, 379(6562): 249, 1996.
    Google ScholarLocate open access versionFindings
  • Max Jaderberg, Valentin Dalibard, Simon Osindero, Wojciech M Czarnecki, Jeff Donahue, Ali Razavi, Oriol Vinyals, Tim Green, Iain Dunning, Karen Simonyan, et al. Population based training of neural networks. arXiv preprint arXiv:1711.09846, 2017.
    Findings
  • Max Jaderberg, Wojciech M. Czarnecki, Iain Dunning, Luke Marris, Guy Lever, Antonio Garcia Castaneda, Charles Beattie, Neil C. Rabinowitz, Ari S. Morcos, Avraham Ruderman, Nicolas Sonnerat, Tim Green, Louise Deason, Joel Z. Leibo, David Silver, Demis Hassabis, Koray Kavukcuoglu, and Thore Graepel. Human-level performance in 3d multiplayer games with population-based reinforcement learning. Science, 364(6443):859–865, 2019. ISSN 0036-8075. doi: 10.1126/science.aau6249. URL https://science.sciencemag.org/content/364/6443/859.
    Locate open access versionFindings
  • Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro Ortega, Dj Strouse, Joel Z Leibo, and Nando De Freitas. Social influence as intrinsic motivation for multi-agent deep reinforcement learning. In International Conference on Machine Learning, pp. 3040–3049, 2019.
    Google ScholarLocate open access versionFindings
  • Joel Lehman, Jeff Clune, Dusan Misevic, Christoph Adami, Lee Altenberg, Julie Beaulieu, Peter J Bentley, Samuel Bernard, Guillaume Beslon, David M Bryson, et al. The surprising creativity of digital evolution: A collection of anecdotes from the evolutionary computation and artificial life research communities. arXiv preprint arXiv:1803.03453, 2018.
    Findings
  • Joel Z Leibo, Vinicius Zambaldi, Marc Lanctot, Janusz Marecki, and Thore Graepel. Multi-agent reinforcement learning in sequential social dilemmas. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, pp. 464–473. International Foundation for Autonomous Agents and Multiagent Systems, 2017.
    Google ScholarLocate open access versionFindings
  • Joel Z Leibo, Edward Hughes, Marc Lanctot, and Thore Graepel. Autocurricula and the emergence of innovation from social interaction: A manifesto for multi-agent intelligence research. arXiv preprint arXiv:1903.00742, 2019a.
    Findings
  • Joel Z Leibo, Julien Perolat, Edward Hughes, Steven Wheelwright, Adam H Marblestone, Edgar Duenez-Guzman, Peter Sunehag, Iain Dunning, and Thore Graepel. Malthusian reinforcement learning. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, pp. 1099–1107. International Foundation for Autonomous Agents and Multiagent Systems, 2019b.
    Google ScholarLocate open access versionFindings
  • Siqi Liu, Guy Lever, Nicholas Heess, Josh Merel, Saran Tunyasuvunakool, and Thore Graepel. Emergent coordination through competition. In International Conference on Learning Representations, 2019.
    Google ScholarLocate open access versionFindings
  • Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, and Igor Mordatch. Multi-agent actorcritic for mixed cooperative-competitive environments. In Advances in Neural Information Processing Systems, pp. 6379–6390, 2017.
    Google ScholarLocate open access versionFindings
  • Shakir Mohamed and Danilo Jimenez Rezende. Variational information maximisation for intrinsically motivated reinforcement learning. In Advances in neural information processing systems, pp. 2125–2133, 2015.
    Google ScholarLocate open access versionFindings
  • Igor Mordatch and Pieter Abbeel. Emergence of grounded compositional language in multi-agent populations. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
    Google ScholarLocate open access versionFindings
  • Charles Ofria and Claus O Wilke. Avida: A software platform for research in computational evolutionary biology. Artificial life, 10(2):191–229, 2004.
    Google ScholarLocate open access versionFindings
  • OpenAI. OpenAI Five. https://blog.openai.com/openai-five/, 2018.
    Findings
  • Georg Ostrovski, Marc G Bellemare, Aaron van den Oord, and Remi Munos. Count-based exploration with neural density models. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2721–2730. JMLR. org, 2017.
    Google ScholarLocate open access versionFindings
  • Jan Paredis. Coevolutionary computation. Artificial life, 2(4):355–375, 1995.
    Google ScholarLocate open access versionFindings
  • Deepak Pathak, Pulkit Agrawal, Alexei A Efros, and Trevor Darrell. Curiosity-driven exploration by self-supervised prediction. In International Conference on Machine Learning, pp. 2778–2787, 2017.
    Google ScholarLocate open access versionFindings
  • Julien Perolat, Joel Z Leibo, Vinicius Zambaldi, Charles Beattie, Karl Tuyls, and Thore Graepel. A multi-agent reinforcement learning model of common-pool resource appropriation. In Advances in Neural Information Processing Systems, pp. 3643–3652, 2017.
    Google ScholarLocate open access versionFindings
  • Lerrel Pinto, Marcin Andrychowicz, Peter Welinder, Wojciech Zaremba, and Pieter Abbeel. Asymmetric actor critic for image-based robot learning. arXiv preprint arXiv:1710.06542, 2017.
    Findings
  • Cambridge, MA: The MIT Press, 1997.
    Google ScholarFindings
  • Aravind Rajeswaran, Vikash Kumar, Abhishek Gupta, Giulia Vezzani, John Schulman, Emanuel Todorov, and Sergey Levine. Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. arXiv preprint arXiv:1709.10087, 2017.
    Findings
  • Thomas S Ray. Evolution, ecology and optimization of digital organisms. Technical report, Citeseer, 1992.
    Google ScholarFindings
  • Christopher D Rosin and Richard K Belew. Methods for competitive co-evolution: Finding opponents worth beating. In ICGA, pp. 373–381, 1995.
    Google ScholarLocate open access versionFindings
  • Jurgen Schmidhuber. A possibility for implementing curiosity and boredom in model-building neural controllers. In Proc. of the international conference on simulation of adaptive behavior: From animals to animats, pp. 222–227, 1991.
    Google ScholarLocate open access versionFindings
  • John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. Highdimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438, 2015.
    Findings
  • John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
    Findings
  • Robert W Shumaker, Kristina R Walkup, and Benjamin B Beck. Animal tool behavior: the use and manufacture of tools by animals. JHU Press, 2011.
    Google ScholarFindings
  • David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484, 2016.
    Google ScholarLocate open access versionFindings
  • David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, et al. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815, 2017.
    Findings
  • Karl Sims. Evolving virtual creatures. In Proceedings of the 21st annual conference on Computer graphics and interactive techniques, pp. 15–22. ACM, 1994a.
    Google ScholarLocate open access versionFindings
  • Karl Sims. Evolving 3d morphology and behavior by competition. Artificial life, 1(4):353–372, 1994b.
    Google ScholarLocate open access versionFindings
  • Satinder Singh, Richard L Lewis, Andrew G Barto, and Jonathan Sorg. Intrinsically motivated reinforcement learning: An evolutionary perspective. IEEE Transactions on Autonomous Mental Development, 2(2):70–82, 2010.
    Google ScholarLocate open access versionFindings
  • L Soros and Kenneth Stanley. Identifying necessary conditions for open-ended evolution through the artificial life world of chromaria. In Artificial Life Conference Proceedings 14, pp. 793–800. MIT Press, 2014.
    Google ScholarLocate open access versionFindings
  • Bradly C Stadie, Sergey Levine, and Pieter Abbeel. Incentivizing exploration in reinforcement learning with deep predictive models. arXiv preprint arXiv:1507.00814, 2015.
    Findings
  • Kenneth O Stanley and Risto Miikkulainen. Competitive coevolution through evolutionary complexification. Journal of artificial intelligence research, 21:63–100, 2004.
    Google ScholarLocate open access versionFindings
  • Alexander L Strehl and Michael L Littman. An analysis of model-based interval estimation for markov decision processes. Journal of Computer and System Sciences, 74(8):1309–1331, 2008.
    Google ScholarLocate open access versionFindings
  • Sainbayar Sukhbaatar, Rob Fergus, et al. Learning multiagent communication with backpropagation. In Advances in Neural Information Processing Systems, pp. 2244–2252, 2016.
    Google ScholarLocate open access versionFindings
  • Sainbayar Sukhbaatar, Zeming Lin, Ilya Kostrikov, Gabriel Synnaeve, Arthur Szlam, and Rob Fergus. Intrinsic motivation and automatic curricula via asymmetric self-play. In International Conference on Learning Representations, 2018.
    Google ScholarLocate open access versionFindings
  • Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 2018.
    Google ScholarFindings
  • Haoran Tang, Rein Houthooft, Davis Foote, Adam Stooke, Xi Chen, Yan Duan, John Schulman, Filip DeTurck, and Pieter Abbeel. # exploration: A study of count-based exploration for deep reinforcement learning. In Advances in neural information processing systems, pp. 2753–2762, 2017.
    Google ScholarLocate open access versionFindings
  • Tim Taylor. Requirements for open-ended evolution in natural and artificial systems. arXiv preprint arXiv:1507.07403, 2015.
    Findings
  • Gerald Tesauro. Temporal difference learning and td-gammon. Communications of the ACM, 38(3): 58–68, 1995.
    Google ScholarLocate open access versionFindings
  • Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033. IEEE, 2012.
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in neural information processing systems, pp. 5998–6008, 2017.
    Google ScholarLocate open access versionFindings
  • Oriol Vinyals, Igor Babuschkin, Junyoung Chung, Michael Mathieu, Max Jaderberg, et al. AlphaStar: Mastering the real-time strategy game StarCraft II. https://deepmind.com/blog/alphastar-mastering-real-time-strategy-game-starcraft-ii/, 2019.
    Findings
  • Rui Wang, Joel Lehman, Jeff Clune, and Kenneth O Stanley. Paired open-ended trailblazer (poet): Endlessly generating increasingly complex and diverse learning environments and their solutions. arXiv preprint arXiv:1901.01753, 2019.
    Findings
  • Annie Xie, Frederik Ebert, Sergey Levine, and Chelsea Finn. Improvisation through physical understanding: Using novel objects as tools with visual foresight. arXiv preprint arXiv:1904.05538, 2019.
    Findings
  • Larry Yaeger. Computational genetics, physiology, metabolism, neural systems, learning, vision, and behavior or poly world: Life in a new context. In SANTA FE INSTITUTE STUDIES IN THE SCIENCES OF COMPLEXITY-PROCEEDINGS VOLUME-, volume 17, pp. 263–263. ADDISON-WESLEY PUBLISHING CO, 1994.
    Google ScholarLocate open access versionFindings
  • Vinicius Zambaldi, David Raposo, Adam Santoro, Victor Bapst, Yujia Li, Igor Babuschkin, Karl Tuyls, David Reichert, Timothy Lillicrap, Edward Lockhart, et al. Relational deep reinforcement learning. arXiv preprint arXiv:1806.01830, 2018.
    Findings
Author
Bowen Baker
Bowen Baker
Ingmar Kanitscheider
Ingmar Kanitscheider
Todor Markov
Todor Markov
Glenn Powell
Glenn Powell
Bob McGrew
Bob McGrew
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科