Relational Deep Reinforcement Learning

Vinícius Flores Zambaldi
Vinícius Flores Zambaldi
David Raposo
David Raposo
Igor Babuschkin
Igor Babuschkin
Victoria Langston
Victoria Langston

arXiv: Learning, Volume abs/1806.01830, 2018.

Cited by: 159|Bibtex|Views337|Links
EI
Keywords:
relational reasoningstructured perceptiondeep rl modeloverall performancedeep rl architectureMore(8+)
Weibo:
Instead of trying to directly characterize the internal representations, we appealed to: a behavioural analysis, and an analysis of the internal mechanisms of the attention mechanism we used to compute entity-entity interactions. showed that the learned representations allowed fo...

Abstract:

We introduce an approach for deep reinforcement learning (RL) that improves upon the efficiency, generalization capacity, and interpretability of conventional approaches through structured perception and relational reasoning. It uses self-attention to iteratively reason about the relations between entities in a scene and to guide a model-...More

Code:

Data:

0
Introduction
  • Recent advances in deep reinforcement learning [1, 2, 3] are in part driven by a capacity to learn good internal representations to inform an agent’s policy.
  • Deep RL models still face important limitations, namely, low sample efficiency and a propensity not to generalize to seemingly minor changes in the task [4, 5, 6, 7]
  • These limitations suggest that large capacity deep RL models tend to overfit to the abundant data on which they are trained, and fail to learn an abstract, interpretable, and generalizable understanding of the problem they are trying to solve.
  • The authors' approach advocates learned and reusable entity- and relation-centric functions [10, 11, 12] to implicitly reason [13] over relational representations
Highlights
  • Recent advances in deep reinforcement learning [1, 2, 3] are in part driven by a capacity to learn good internal representations to inform an agent’s policy
  • Deep reinforcement learning models still face important limitations, namely, low sample efficiency and a propensity not to generalize to seemingly minor changes in the task [4, 5, 6, 7]
  • These limitations suggest that large capacity deep reinforcement learning models tend to overfit to the abundant data on which they are trained, and fail to learn an abstract, interpretable, and generalizable understanding of the problem they are trying to solve
  • Instead of trying to directly characterize the internal representations, we appealed to: (1) a behavioural analysis, and (2) an analysis of the internal mechanisms of the attention mechanism we used to compute entity-entity interactions. (1) showed that the learned representations allowed for better generalization, which is characteristic of relational representations. (2) showed that the model’s internal computations were interpretable, and congruent with the computations we would expect from a model computing task-relevant relations
  • Future work could draw on computer vision for more sophisticated structured perceptual reasoning mechanisms (e.g., [34]), and hierarchical reinforcement learning and planning [35, 36] to allow structured representations and reasoning to translate more fully into structured behaviors
Methods
  • Box-World2 is a perceptually simple but combinatorially complex environment that requires abstract relational reasoning and planning.
  • It consists of a 12 × 12 pixel room with keys and boxes randomly scattered.
  • The room contains an agent, represented by a single dark gray pixel, which can move in four directions: up, down, left, right.
  • Keys are represented by a single colored pixel.
  • Boxes are represented by two adjacent colored pixels – the pixel on the right represents the box’s lock and its color indicates which key can be used to open that lock; the pixel on the left indicates the content of the box which is inaccessible while the box is locked
Results
  • In the StarCraft II Learning Environment, the agent achieves state-of-the-art performance on six mini-games – surpassing human grandmaster performance on four.
  • Agents augmented with the relational module achieved close to optimal performance in the two variants of this task, solving more than 98% of the levels.
  • The authors' control agents, which can only rely on convolutional and fully-connected layers, performed significantly worse, solving less than 75% of the levels across the two task variants.
  • In the first condition the agent with the relational module solved more than 88% of the levels, across all three solution length conditions
Conclusion
  • By introducing structured perception and relational reasoning into deep RL architectures, the agents can learn interpretable representations, and exceed baseline agents in terms of sample complexity, ability to generalize, and overall performance.
  • This demonstrates key benefits of marrying insights from RRL with the representational power of deep learning.
  • The authors' work opens new directions for RL via a principled hybrid of flexible statistical learning and more structured approaches
Summary
  • Recent advances in deep reinforcement learning [1, 2, 3] are in part driven by a capacity to learn good internal representations to inform an agent’s policy.
  • Our contributions are as follows: (1) we create and analyze an RL task called Box-World that explicitly targets relational reasoning, and demonstrate that agents with a capacity to produce relational representations using a non-local computation based on attention [14] exhibit interesting generalization behaviors compared to those that do not, and (2) we apply the agent to a difficult problem – the StarCraft II mini-games [15] – and achieve state-of-the-art performance on six minigames.
  • We translate ideas from RRL into architecturally specified inductive biases within a deep RL agent, using neural network models that operate on structured representations of a scene – sets of entities – and perform relational reasoning via iterated, message-passing-like modes of processing.
  • The entities correspond to local regions of an image, and the agent learns to attend to key objects and compute their pairwise and higher-order interactions.
  • We equip a deep RL agent with architectural inductive biases that may be better suited for learning relations, rather than specifying them as background knowledge as in RRL.
  • A key can only be used once, so the agent must be able to reason about whether a particular box is along a distractor branch or along the solution path.
  • The attention weights captured a link between a key and its corresponding lock, using a shared computation across entities.
  • If the function used to compute the weights has learned to represent some general, abstract notion of what it means to “unlock” – e.g., unlocks – this function should be able to generalize to key-lock combinations that it has never observed during training.
  • We tested the model under two conditions, without further training: (1) on levels that required opening a longer sequence of boxes than it had ever observed (6, 8 and 10), and (2) on levels that required using a key-lock combination that was never required for reaching the gem during training, instead only being placed on distractor paths.
  • The agent augmented with a relational module achieved state-of-the-art results in six mini-games and its performance surpassed that of the human grandmaster in four of them.4
  • By introducing structured perception and relational reasoning into deep RL architectures, our agents can learn interpretable representations, and exceed baseline agents in terms of sample complexity, ability to generalize, and overall performance.
  • Our inductive biases for entity- and relation-centric representations and iterated reasoning reflect key knowledge about the structure of the world.
  • It will be important to further explore the semantics of the agent’s learned representations, through the lens of what one might hard-code in traditional RRL
Tables
  • Table1: Mean scores achieved in the StarCraft II mini-games using full action set. ↑ denotes a score that is higher than a StarCraft Grandmaster. Mini-games: (1) Move To Beacon, (2) Collect Mineral Shards, (3) Find And Defeat Zerglings, (4) Defeat Roaches, (5) Defeat Zerglings And Banelings, (6) Collect Minerals And Gas, (7) Build Marines
  • Table2: Shared fixed hyperparameters across mini-games
  • Table3: Fixed MHDPA settings for StarCraft II mini-games
  • Table4: Swept hyperparameters across mini-games
Download tables as Excel
Funding
  • In the StarCraft II Learning Environment, our agent achieves state-of-the-art performance on six mini-games – surpassing human grandmaster performance on four
  • Agents augmented with our relational module achieved close to optimal performance in the two variants of this task, solving more than 98% of the levels
  • Our control agents, which can only rely on convolutional and fully-connected layers, performed significantly worse, solving less than 75% of the levels across the two task variants
  • In the first condition the agent with the relational module solved more than 88% of the levels, across all three solution length conditions
Reference
  • Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529, 2015.
    Google ScholarLocate open access versionFindings
  • David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484–489, 2016.
    Google ScholarLocate open access versionFindings
  • Andrei A. Rusu, Matej Vecerik, Thomas Rothörl, Nicolas Heess, Razvan Pascanu, and Raia Hadsell. Sim-to-real robot learning from pixels with progressive nets. In 1st Annual Conference on Robot Learning, CoRL 2017, Mountain View, California, USA, November 13-15, 2017, Proceedings, pages 262–270, 2017.
    Google ScholarLocate open access versionFindings
  • Marta Garnelo, Kai Arulkumaran, and Murray Shanahan. Towards deep symbolic reinforcement learning. arXiv preprint arXiv:1609.05518, 2016.
    Findings
  • Chiyuan Zhang, Oriol Vinyals, Remi Munos, and Samy Bengio. A study on overfitting in deep reinforcement learning. arXiv preprint arXiv:1804.06893, 2018.
    Findings
  • Brenden M Lake, Tomer D Ullman, Joshua B Tenenbaum, and Samuel J Gershman. Building machines that learn and think like people. Behavioral and Brain Sciences, 40, 2017.
    Google ScholarLocate open access versionFindings
  • Ken Kansky, Tom Silver, David A Mély, Mohamed Eldawy, Miguel Lázaro-Gredilla, Xinghua Lou, Nimrod Dorfman, Szymon Sidor, Scott Phoenix, and Dileep George. Schema networks: Zero-shot transfer with a generative causal model of intuitive physics. arXiv preprint arXiv:1706.04317, 2017.
    Findings
  • Saso Dzeroski, Luc De Raedt, and Hendrik Blockeel. Relational reinforcement learning. In Inductive Logic Programming, 8th International Workshop, ILP-98, Madison, Wisconsin, USA, July 22-24, 1998, Proceedings, pages 11–22, 1998.
    Google ScholarLocate open access versionFindings
  • Saso Dzeroski, Luc De Raedt, and Kurt Driessens. Relational reinforcement learning. Machine Learning, 43(1/2):7–52, 2001.
    Google ScholarLocate open access versionFindings
  • Peter Battaglia, Razvan Pascanu, Matthew Lai, Danilo Jimenez Rezende, et al. Interaction networks for learning about objects, relations and physics. In Advances in neural information processing systems, pages 4502–4510, 2016.
    Google ScholarLocate open access versionFindings
  • David Raposo, Adam Santoro, David Barrett, Razvan Pascanu, Timothy Lillicrap, and Peter Battaglia. Discovering objects and their relations from entangled scene representations. arXiv preprint arXiv:1702.05068, 2017.
    Findings
  • Peter W. Battaglia, Jessica B. Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, Caglar Gulcehre, Francis Song, Andrew Ballard, Justin Gilmer, George Dahl, Ashish Vaswani, Kelsey Allen, Charles Nash, Victoria Langston, Chris Dyer, Nicolas Heess, Daan Wierstra, Pushmeet Kohli, Matt Botvinick, Oriol Vinyals, Yujia Li, and Razvan Pascanu. Relational inductive biases, deep learning, and graph networks. arXiv, 2018.
    Google ScholarFindings
  • Adam Santoro, David Raposo, David G Barrett, Mateusz Malinowski, Razvan Pascanu, Peter Battaglia, and Tim Lillicrap. A simple neural network module for relational reasoning. In Advances in neural information processing systems, pages 4974–4983, 2017.
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems, pages 6000–6010, 2017.
    Google ScholarLocate open access versionFindings
  • Oriol Vinyals, Timo Ewalds, Sergey Bartunov, Petko Georgiev, Alexander Sasha Vezhnevets, Michelle Yeo, Alireza Makhzani, Heinrich Küttler, John Agapiou, Julian Schrittwieser, et al. Starcraft ii: a new challenge for reinforcement learning. arXiv preprint arXiv:1708.04782, 2017.
    Findings
  • Stephen Muggleton and Luc De Raedt. Inductive logic programming: Theory and methods. J. Log. Program., 19/20:629–679, 1994.
    Google ScholarLocate open access versionFindings
  • Kurt Driessens and Jan Ramon. Relational instance based regression for relational reinforcement learning. In Machine Learning, Proceedings of the Twentieth International Conference (ICML 2003), August 21-24, 2003, Washington, DC, USA, pages 123–130, 2003.
    Google ScholarLocate open access versionFindings
  • Kurt Driessens and Saso Dzeroski. Integrating guidance into relational reinforcement learning. Machine Learning, 57(3):271–304, 2004.
    Google ScholarLocate open access versionFindings
  • M. van Otterlo. Relational representations in reinforcement learning: Review and open problems, 7 2002.
    Google ScholarLocate open access versionFindings
  • Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. Non-local neural networks. arXiv preprint arXiv:1711.07971, 2017.
    Findings
  • Nicholas Watters, Andrea Tacchetti, Theophane Weber, Razvan Pascanu, Peter Battaglia, and Daniel Zoran. Visual interaction networks. arXiv preprint arXiv:1706.01433, 2017.
    Findings
  • Han Hu, Jiayuan Gu, Zheng Zhang, Jifeng Dai, and Yichen Wei. Relation networks for object detection. arXiv preprint arXiv:1711.11575, 2017.
    Findings
  • Irwan Bello, Hieu Pham, Quoc V Le, Mohammad Norouzi, and Samy Bengio. Neural combinatorial optimization with reinforcement learning. arXiv preprint arXiv:1611.09940, 2016.
    Findings
  • Hanjun Dai, Elias Khalil, Yuyu Zhang, Bistra Dilkina, and Le Song. Learning combinatorial optimization algorithms over graphs. In Advances in Neural Information Processing Systems, pages 6351–6361, 2017.
    Google ScholarLocate open access versionFindings
  • Nikhil Mishra, Mostafa Rohaninejad, Xi Chen, and Pieter Abbeel. A simple neural attentive meta-learner. In NIPS 2017 Workshop on Meta-Learning, 2017.
    Google ScholarLocate open access versionFindings
  • WWM Kool and M Welling. Attention solves your tsp. arXiv preprint arXiv:1803.08475, 2018.
    Findings
  • Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. The graph neural network model. IEEE Transactions on Neural Networks, 20(1):61–80, 2009.
    Google ScholarLocate open access versionFindings
  • Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. Learning convolutional neural networks for graphs. In International conference on machine learning, pages 2014–2023, 2016.
    Google ScholarLocate open access versionFindings
  • Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
    Findings
  • Misha Denil, Sergio Gómez Colmenarejo, Serkan Cabi, David Saxton, and Nando de Freitas. Programmable agents. arXiv preprint arXiv:1706.06383, 2017.
    Findings
  • Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
    Findings
  • Lasse Espeholt, Hubert Soyer, Remi Munos, Karen Simonyan, Volodymir Mnih, Tom Ward, Yotam Doron, Vlad Firoiu, Tim Harley, Iain Dunning, et al. Importance weighted actor-learner architecture: Scalable distributed deep-rl with importance weighted actor-learner architectures. arXiv preprint arXiv:1802.01561, 2018.
    Findings
  • Max Jaderberg, Valentin Dalibard, Simon Osindero, Wojciech M Czarnecki, Jeff Donahue, Ali Razavi, Oriol Vinyals, Tim Green, Iain Dunning, Karen Simonyan, et al. Population based training of neural networks. arXiv preprint arXiv:1711.09846, 2017.
    Findings
  • Xinlei Chen, Li-Jia Li, Li Fei-Fei, and Abhinav Gupta. Iterative visual reasoning beyond convolutions. arXiv preprint arXiv:1803.11189, 2018.
    Findings
  • Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, and Koray Kavukcuoglu. Feudal networks for hierarchical reinforcement learning. arXiv preprint arXiv:1703.01161, 2017.
    Findings
  • Arthur Guez, Théophane Weber, Ioannis Antonoglou, Karen Simonyan, Oriol Vinyals, Daan Wierstra, Rémi Munos, and David Silver. Learning to search with mctsnets. arXiv preprint arXiv:1802.04697, 2018.
    Findings
  • Jessica B Hamrick, Andrew J Ballard, Razvan Pascanu, Oriol Vinyals, Nicolas Heess, and Peter W Battaglia. Metacontrol for adaptive imagination-based optimization. arXiv preprint arXiv:1705.02670, 2017.
    Findings
  • Razvan Pascanu, Yujia Li, Oriol Vinyals, Nicolas Heess, Lars Buesing, Sebastien Racanière, David Reichert, Théophane Weber, Daan Wierstra, and Peter Battaglia. Learning model-based planning from scratch. arXiv preprint arXiv:1707.06170, 2017.
    Findings
  • Théophane Weber, Sébastien Racanière, David P Reichert, Lars Buesing, Arthur Guez, Danilo Jimenez Rezende, Adria Puigdomènech Badia, Oriol Vinyals, Nicolas Heess, Yujia Li, et al. Imagination-augmented agents for deep reinforcement learning. arXiv preprint arXiv:1707.06203, 2017.
    Findings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks. In European Conference on Computer Vision, pages 630–645.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments