Building Generalizable Agents with a Realistic and Rich 3D Environment

ICLR, Volume abs/1801.02209, 2018.

Cited by: 156|Bibtex|Views170
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com|arxiv.org
Weibo:
We propose a new environment, House3D, which contains 45K houses with a diverse set of objects and natural layouts resembling the real-world

Abstract:

Towards bridging the gap between machine and human intelligence, it is of utmost importance to introduce environments that are visually realistic and rich in content. In such environments, one can evaluate and improve a crucial property of practical intelligent systems, namely generalization. In this work, we build House3D, a rich, extens...More

Code:

Data:

0
Introduction
  • Deep reinforcement learning has shown its strength on multiple games, such as Atari (Mnih et al, 2015) and Go (Silver et al, 2016), vastly overpowering human performance.
  • Pixellevel variations are applied to the observation signals in order to increase the agent’s robustness to unseen environments (Beattie et al, 2016; Higgins et al, 2017; Tobin et al, 2017).
  • Transfer learning is applied to similar tasks but with different rewards (Finn et al, 2017b)
Highlights
  • Deep reinforcement learning has shown its strength on multiple games, such as Atari (Mnih et al, 2015) and Go (Silver et al, 2016), vastly overpowering human performance
  • We show that in order to achieve strong generalization capability, all-levels of augmentations are needed: pixel-level augmentation by domain randomization (Tobin et al, 2017) enhances the agent’s robustness to color variations; object-level augmentation forces the agent to learn multiple concepts (20 in number) simultaneously, and scene-level augmentation, where a diverse set of environments is used, enforce generalizability across diverse scenes, mitigating overfitting to particular scenes
  • We explore domain randomization by generating an additional 180 houses with random object coloring from Esmall, which leads to a total of 200 houses
  • We propose a new environment, House3D, which contains 45K houses with a diverse set of objects and natural layouts resembling the real-world
  • We note that generalization to unseen environments was rarely studied in previous works
  • The final performance on unseen environments is much higher than baseline methods by over 8%
Results
  • The authors report experimental results for the models on the task of RoomNav. The authors first compare models with discrete and continuous action spaces with different input modalities.
  • The authors explain the observations and show that techniques targeting different levels of augmentation improve the success rate of navigation in the test set.
  • These techniques are complementary to each other.
  • The authors train the baseline models on multiple experimental settings.
  • The authors use two training datasets.
  • A held-out dataset Etest is used for test, which contains 50 houses
Conclusion
  • The authors propose a new environment, House3D, which contains 45K houses with a diverse set of objects and natural layouts resembling the real-world.

    In House3D, the authors teach an agent to accomplish semantic goals.
  • The authors note that generalization to unseen environments was rarely studied in previous works.
  • To this end, the authors quantify the effect of various levels of augmentations, all facilitated by House3D by the means of domain randomization, multi-target training and the diversity of the environment.
  • The authors hope House3D as well as the training techniques can benefit the whole RL community for building generalizable agents
Summary
  • Introduction:

    Deep reinforcement learning has shown its strength on multiple games, such as Atari (Mnih et al, 2015) and Go (Silver et al, 2016), vastly overpowering human performance.
  • Pixellevel variations are applied to the observation signals in order to increase the agent’s robustness to unseen environments (Beattie et al, 2016; Higgins et al, 2017; Tobin et al, 2017).
  • Transfer learning is applied to similar tasks but with different rewards (Finn et al, 2017b)
  • Results:

    The authors report experimental results for the models on the task of RoomNav. The authors first compare models with discrete and continuous action spaces with different input modalities.
  • The authors explain the observations and show that techniques targeting different levels of augmentation improve the success rate of navigation in the test set.
  • These techniques are complementary to each other.
  • The authors train the baseline models on multiple experimental settings.
  • The authors use two training datasets.
  • A held-out dataset Etest is used for test, which contains 50 houses
  • Conclusion:

    The authors propose a new environment, House3D, which contains 45K houses with a diverse set of objects and natural layouts resembling the real-world.

    In House3D, the authors teach an agent to accomplish semantic goals.
  • The authors note that generalization to unseen environments was rarely studied in previous works.
  • To this end, the authors quantify the effect of various levels of augmentations, all facilitated by House3D by the means of domain randomization, multi-target training and the diversity of the environment.
  • The authors hope House3D as well as the training techniques can benefit the whole RL community for building generalizable agents
Tables
  • Table1: A summary of popular environments. The attributes include 3D: 3D nature of the rendered objects, Realistic: resemblance to the real-world, Large-scale: a large set of environments, Fast: fast rendering speed and Customizable: flexibility to be customized to other applications
  • Table2: Statistics of the selected environment sets for RoomNav. RoomType% denotes the percentage of houses containing at least one target room of type RoomType
  • Table3: Detailed test success rates for gated-CNN model and gated-LSTM model with “Mask+Depth” as input signal across different instruction concepts
  • Table4: Averaged number of steps towards the target in all success trials for all the evaluated models with various input signals and different environments
Download tables as Excel
Related work
  • Environments: Table 1 shows the comparison between House3D and most relevant prior works. There are other simulated environments which focus on different domains, such as OpenAI Gym (Brockman et al, 2016), ParlAI (Miller et al, 2017) for language communication as well as some strategic game environments (Synnaeve et al, 2016; Tian et al, 2017; Vinyals et al, 2017), etc. Most of these environments are pertinent to one particular aspect of intelligence, such as dialogue or a single type of game, which makes it hard to facilitate the study of more comprehensive Environment

    Atari (Bellemare et al, 2013) OpenAI Universe (Shi et al, 2017)

    Malmo (Johnson et al, 2016) DeepMind Lab (Beattie et al, 2016)

    VizDoom (Kempka et al, 2016) AI2-THOR (Zhu et al, 2017) Stanford2D-3D (Armeni et al, 2016) Matterport3D (Chang et al, 2017) House3D •

    problems. On the contrary, we focus on building a platform that intersects with multiple research directions, such as object and scene understanding, 3D navigation, embodied question answering (Das et al, 2017a), while allowing users to customize the level of complexity to their needs.

    We build on SUNCG (Song et al, 2017), a dataset that consists of thousands of diverse synthetic indoor scenes equipped with a variety of objects and layouts. Its visual diversity and rich content opens the path to the study of semantic generalization for reinforcement learning agents. Our platform decouples high-performance rendering from data I/O, and thus can use other publicly available 3D scene datasets as well. This includes Al2-THOR (Zhu et al, 2017), SceneNet RGB-D (McCormac et al, 2017), Stanford 3D (Armeni et al, 2016), Matterport 3D (Chang et al, 2017) and so on.
Reference
  • Jacob Andreas, Dan Klein, and Sergey Levine. Modular multitask reinforcement learning with policy sketches. arXiv preprint arXiv:1611.01796, 2016.
    Findings
  • Iro Armeni, Ozan Sener, Amir R. Zamir, Helen Jiang, Ioannis Brilakis, Martin Fischer, and Silvio Savarese. 3D semantic parsing of large-scale indoor spaces. CVPR, 2016.
    Google ScholarLocate open access versionFindings
  • Charles Beattie, Joel Z Leibo, Denis Teplyashin, Tom Ward, Marcus Wainwright, Heinrich Kuttler, Andrew Lefrancq, Simon Green, Vıctor Valdes, Amir Sadik, et al. Deepmind lab. arXiv preprint arXiv:1612.03801, 2016.
    Findings
  • Marc G Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. The arcade learning environment: An evaluation platform for general agents. J. Artif. Intell. Res.(JAIR), 47:253–279, 2013.
    Google ScholarLocate open access versionFindings
  • Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. OpenAI gym. arXiv preprint arXiv:1606.01540, 2016.
    Findings
  • Simon Brodeur, Ethan Perez, Ankesh Anand, Florian Golemo, Luca Celotti, Florian Strub Strub, Jean Rouat, Hugo Larochelle, and Aaron Courville Courville. HoME: a household multimodal environment. arXiv 1711.11017, 2017.
    Findings
  • Angel Chang, Angela Dai, Thomas Funkhouser, Maciej Halber, Matthias Niessner, Manolis Savva, Shuran Song, Andy Zeng, and Yinda Zhang. Matterport3d: Learning from RGB-D data in indoor environments. International Conference on 3D Vision (3DV), 2017.
    Google ScholarLocate open access versionFindings
  • Devendra Singh Chaplot, Kanthashree Mysore Sathyendra, Rama Kumar Pasumarthi, Dheeraj Rajagopal, and Ruslan Salakhutdinov. Gated-attention architectures for task-oriented language grounding. arXiv preprint arXiv:1706.07230, 2017.
    Findings
  • Paul Christiano, Zain Shah, Igor Mordatch, Jonas Schneider, Trevor Blackwell, Joshua Tobin, Pieter Abbeel, and Wojciech Zaremba. Transfer from simulation to real world through learning deep inverse dynamics model. arXiv preprint arXiv:1610.03518, 2016.
    Findings
  • Abhishek Das, Samyak Datta, Georgia Gkioxari, Stefan Lee, Devi Parikh, and Dhruv Batra. Embodied Question Answering. arXiv preprint arXiv:1711.11543, 2017a.
    Findings
  • Abhishek Das, Satwik Kottur, Stefan Lee, Jos M.F. Moura, and Dhruv Batra. Learning cooperative visual dialog agents with deep reinforcement learning. ICCV, 2017b.
    Google ScholarFindings
  • Bhuwan Dhingra, Hanxiao Liu, Zhilin Yang, William W Cohen, and Ruslan Salakhutdinov. Gated-attention readers for text comprehension. arXiv preprint arXiv:1606.01549, 2016.
    Findings
  • Gandhi Dhiraj, Pinto Lerrel, and Gupta Abhinav. Learning to fly by crashing. IROS, 2017. Yan Duan, John Schulman, Xi Chen, Peter L Bartlett, Ilya Sutskever, and Pieter Abbeel. RL2: Fast reinforcement learning via slow reinforcement learning. arXiv preprint arXiv:1611.02779, 2016.
    Findings
  • Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks. arXiv preprint arXiv:1703.03400, 2017a.
    Findings
  • Chelsea Finn, Tianhe Yu, Justin Fu, Pieter Abbeel, and Sergey Levine. Generalizing skills with semi-supervised reinforcement learning. ICLR, 2017b.
    Google ScholarFindings
  • Dieter Fox, Sebastian Thrun, and Wolfram Burgard. Probabilistic Robotics. MIT press, 2005.
    Google ScholarFindings
  • Saurabh Gupta, James Davidson, Sergey Levine, Rahul Sukthankar, and Jitendra Malik. Cognitive mapping and planning for visual navigation. CVPR, 2017.
    Google ScholarLocate open access versionFindings
  • Nicolas Heess, Gregory Wayne, David Silver, Tim Lillicrap, Tom Erez, and Yuval Tassa. Learning continuous control policies by stochastic value gradients. In Advances in Neural Information Processing Systems, pp. 2944–2952, 2015.
    Google ScholarLocate open access versionFindings
  • Irina Higgins, Arka Pal, Andrei A Rusu, Loic Matthey, Christopher P Burgess, Alexander Pritzel, Matthew Botvinick, Charles Blundell, and Alexander Lerchner. Darla: Improving zero-shot transfer in reinforcement learning. arXiv preprint arXiv:1707.08475, 2017.
    Findings
  • Sepp Hochreiter and Jurgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
    Google ScholarLocate open access versionFindings
  • Max Jaderberg, Volodymyr Mnih, Wojciech Marian Czarnecki, Tom Schaul, Joel Z Leibo, David Silver, and Koray Kavukcuoglu. Reinforcement learning with unsupervised auxiliary tasks. arXiv preprint arXiv:1611.05397, 2016.
    Findings
  • Eric Jang, Shixiang Gu, and Ben Poole. Categorical reparameterization with Gumbel-Softmax. arXiv preprint arXiv:1611.01144, 2016.
    Findings
  • Matthew Johnson, Katja Hofmann, Tim Hutton, and David Bignell. The Malmo platform for artificial intelligence experimentation. In IJCAI, pp. 4246–4247, 2016.
    Google ScholarLocate open access versionFindings
  • Simon Green Fumin Wang Ryan Faulkner Hubert Soyer David Szepesvari Wojciech Marian Czarnecki Max Jaderberg Denis Teplyashin Marcus Wainwright Chris Apps Demis Hassabis Karl Moritz Hermann, Felix Hill and PhilBlunsom. Grounded language learning in a simulated 3d world. In arXiv 1706.06551. 2017.
    Findings
  • Michał Kempka, Marek Wydmuch, Grzegorz Runc, Jakub Toczek, and Wojciech Jaskowski. Vizdoom: A doom-based AI research platform for visual reinforcement learning. In Computational Intelligence and Games (CIG), 2016 IEEE Conference on, pp. 1–8. IEEE, 2016.
    Google ScholarLocate open access versionFindings
  • Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
    Findings
  • John J. Leonard and Hugh F. Durrant-Whyte. Directed Sonar Sensing for Mobile Robot Navigation. Kluwer Academic Publishers, Norwell, MA, USA, 1992. ISBN 0792392426.
    Google ScholarFindings
  • Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel. End-to-end training of deep visuomotor policies. JMLR, 2016.
    Google ScholarLocate open access versionFindings
  • Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
    Findings
  • Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, and Igor Mordatch. Multi-agent actor-critic for mixed cooperative-competitive environments. arXiv preprint arXiv:1706.02275, 2017.
    Findings
  • John McCormac, Ankur Handa, Stefan Leutenegger, and Andrew J Davison. SceneNet RGB-D: Can 5m synthetic images beat generic ImageNet pre-training on indoor segmentation? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2678–2687, 2017.
    Google ScholarLocate open access versionFindings
  • Alexander H Miller, Will Feng, Adam Fisch, Jiasen Lu, Dhruv Batra, Antoine Bordes, Devi Parikh, and Jason Weston. Parlai: A dialog research software platform. arXiv preprint arXiv:1705.06476, 2017.
    Findings
  • Piotr Mirowski, Razvan Pascanu, Fabio Viola, Hubert Soyer, Andy Ballard, Andrea Banino, Misha Denil, Ross Goroshin, Laurent Sifre, Koray Kavukcuoglu, et al. Learning to navigate in complex environments. arXiv preprint arXiv:1611.03673, 2016.
    Findings
  • Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
    Google ScholarLocate open access versionFindings
  • Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In International Conference on Machine Learning, pp. 1928–1937, 2016.
    Google ScholarLocate open access versionFindings
  • Karthik Narasimhan, Regina Barzilay, and Tommi Jaakkola. Deep transfer in reinforcement learning by language grounding. arXiv preprint arXiv:1708.00133, 2017.
    Findings
  • Junhyuk Oh, Satinder Singh, Honglak Lee, and Pushmeet Kohli. Zero-shot task generalization with multi-task deep reinforcement learning. arXiv preprint arXiv:1706.05064, 2017.
    Findings
  • Emilio Parisotto and Ruslan Salakhutdinov. Neural map: Structured memory for deep reinforcement learning. arXiv preprint arXiv:1702.08360, 2017.
    Findings
Full Text
Your rating :
0

 

Tags
Comments