BADGR: An Autonomous Self-Supervised Learning-Based Navigation System

Cited by: 8|Bibtex|Views217
Other Links: arxiv.org
Keywords:
indoor navigationreal world environmenttall grassgeometric problemsimultaneous localization and mappingMore(9+)
Weibo:
We presented BADGR, an end-to-end learning-based mobile robot navigation system that can be trained entirely with selfsupervised, off-policy data gathered in real-world environments, without any simulation or human supervision, and can improve as it gathers more data

Abstract:

Mobile robot navigation is typically regarded as a geometric problem, in which the robot's objective is to perceive the geometry of the environment in order to plan collision-free paths towards a desired goal. However, a purely geometric view of the world can can be insufficient for many navigation problems. For example, a robot navigat...More

Code:

Data:

0
Introduction
  • Navigation for mobile robots is often regarded as primarily a geometric problem, where the aim is to construct either a local or global map of the environment, and plan a path through this map [1]
  • While this approach has produced excellent results in a range of settings, from indoor navigation [2] to autonomous driving [3], open-world mobile robot navigation often present challenges that are difficult to address with a purely geometric view of the world.
  • The authors study how autonomous, selfsupervised learning from experience can enable a robot to learn about the affordances in its environment using raw visual perception and without human-provided labels or geometric maps
Highlights
  • Navigation for mobile robots is often regarded as primarily a geometric problem, where the aim is to construct either a local or global map of the environment, and plan a path through this map [1]
  • The primary contribution of this work is an end-to-end learning-based mobile robot navigation system that can be trained entirely with self-supervised off-policy data gathered in real-world environments, without any simulation or human supervision
  • Our results demonstrate that our BADGR system can learn to navigate in real-world environments with geometrically distracting obstacles, such as tall grass, and can readily incorporate terrain preferences, such as avoiding bumpy terrain, using only 42 hours of autonomously collected data
  • We developed BADGR, an endto-end learning-based mobile robot navigation system that can be trained entirely with self-supervised, off-policy data gathered in real-world environments, without any simulation or human supervision, and can improve as it gathers more data
  • We demonstrated that our approach can learn to navigate in real-world environments with geometrically distracting obstacles, such as tall grass, and can readily incorporate terrain preferences, such as avoiding bumpy terrain, using only 42 hours of data
  • Investigating methods that train policies which are cautious in novel environments could further decrease the amount of human supervision needed while collecting data
Methods
  • BADGR BADGR w/o bumpy cost LIDAR Naıve

    Successfully Reached Goal Avg. Bumpiness

    N/A as such t+H −1

    R(e0t::tK+H ) = − RCOLL(e0t :K )+ (5)

    t =t αPOS · RPOS(e0t :K ) + αBUM · RBUM(e0t :K )

    RCOLL(e0t :K ) = etCOLL RPOS(e0t :K ) = (1.
  • BADGR BADGR w/o bumpy cost LIDAR Naıve.
  • Reached Goal Avg. Bumpiness.
  • R(e0t::tK+H ) = − RCOLL(e0t :K )+ (5).
  • T =t αPOS · RPOS(e0t :K ) + αBUM · RBUM(e0t :K ).
  • RCOLL(e0t :K ) = etCOLL RPOS(e0t :K ) = (1
Conclusion
  • The authors presented BADGR, an end-to-end learning-based mobile robot navigation system that can be trained entirely with selfsupervised, off-policy data gathered in real-world environments, without any simulation or human supervision, and can improve as it gathers more data.
  • BADGR can autonomously gather data with minimal human supervision, the robot periodically requires human assistance, for example if the robot flips over.
  • The authors believe that solving these and other challenges is crucial for enabling robot learning platforms to learn and act in the real world, and that BADGR is a promising step towards this goal
Summary
  • Introduction:

    Navigation for mobile robots is often regarded as primarily a geometric problem, where the aim is to construct either a local or global map of the environment, and plan a path through this map [1]
  • While this approach has produced excellent results in a range of settings, from indoor navigation [2] to autonomous driving [3], open-world mobile robot navigation often present challenges that are difficult to address with a purely geometric view of the world.
  • The authors study how autonomous, selfsupervised learning from experience can enable a robot to learn about the affordances in its environment using raw visual perception and without human-provided labels or geometric maps
  • Objectives:

    The authors' goal is to enable a mobile robot to navigate in realworld environments.
  • Methods:

    BADGR BADGR w/o bumpy cost LIDAR Naıve

    Successfully Reached Goal Avg. Bumpiness

    N/A as such t+H −1

    R(e0t::tK+H ) = − RCOLL(e0t :K )+ (5)

    t =t αPOS · RPOS(e0t :K ) + αBUM · RBUM(e0t :K )

    RCOLL(e0t :K ) = etCOLL RPOS(e0t :K ) = (1.
  • BADGR BADGR w/o bumpy cost LIDAR Naıve.
  • Reached Goal Avg. Bumpiness.
  • R(e0t::tK+H ) = − RCOLL(e0t :K )+ (5).
  • T =t αPOS · RPOS(e0t :K ) + αBUM · RBUM(e0t :K ).
  • RCOLL(e0t :K ) = etCOLL RPOS(e0t :K ) = (1
  • Conclusion:

    The authors presented BADGR, an end-to-end learning-based mobile robot navigation system that can be trained entirely with selfsupervised, off-policy data gathered in real-world environments, without any simulation or human supervision, and can improve as it gathers more data.
  • BADGR can autonomously gather data with minimal human supervision, the robot periodically requires human assistance, for example if the robot flips over.
  • The authors believe that solving these and other challenges is crucial for enabling robot learning platforms to learn and act in the real world, and that BADGR is a promising step towards this goal
Related work
  • Autonomous mobile robot navigation has been extensively studied in many real-world scenarios, ranging from indoor navigation [2, 10, 11, 12] to outdoor driving [3, 13, 14, 15, 16]. The predominant approach for autonomous navigation is to have the robot build a map, localize itself in the map, and use the map in order to plan and execute actions that enable the robot to achieve its goal. This simultaneous localization and mapping (SLAM) and planning approach [1] has achieved impressive results, and underlies current state-of-theart autonomous navigation technologies [17, 18]. However, these approaches still have limitations, such as performance degradation in textureless scenes, requiring expensive sensors, and—most importantly—do not get better as the robot acts in the world [19].

    Learning-based methods have shown promise in addressing these limitations by learning from data. One approach to improve upon SLAM methods is to directly estimate the geometry of the scene [20, 21, 22]. However, these methods are limited in that the geometry is only a partial description of the environment. Only learning about geometry can lead to unintended consequences, such as believing that a field of tall grass is untraversable. Semantic-based learning approaches attempt to address the limitations of purely geometric methods by associating the input sensory data with semantically meaningful labels, such as which pixels in an image correspond to traversable or bumpy terrain. However, these methods typically depend on existing SLAM approaches [4, 5, 23, 9, 24] or humans [6, 7] in order to provide the semantic labels, which consequently means these approaches either inherit the limitations of geometric approaches or are not autonomously self-improving. Methods based on imitation learning have been demonstrated on real-world robots [25, 26, 27], but again depend on humans for expert demonstrations, which does not constitute a continuously self-improving system. End-toend reinforcement learning approaches have shown promise in automating the entire navigation pipeline. However, these methods have typically focused on pure geometric reasoning, require on-policy data, and often utilize simulation due to constraints such as sample efficiency [28, 8, 29, 30, 31, 32]. Prior works have investigated learning navigation policies directly from real-world experience, but typically require a person [30, 33, 34] or SLAM algorithm [35] to gather the data, assume access to the ground-truth robot state [36], learn using low-bandwidth sensors [37], or only perform collision avoidance [8, 38]. Our approach overcomes the limitations of these prior works by designing an end-to-end reinforcement learning approach that directly learns to predict relevant navigation cues with a sample-efficient, off-policy algorithm, and can continue to improve with additional experience via a selfsupervised data labelling mechanism that does not depend on humans or SLAM algorithms.
Funding
  • This work was supported by ARL DCIST CRA W911NF17-2-0181, the National Science Foundation via IIS-1700697, the DARPA Assured Autonomy program, and Berkeley Deep Drive
  • Gregory Kahn was supported by an NSF Graduate Research Fellowship
Reference
  • S. Thrun, W. Burgard, and D. Fox. Probabilistic Robotics. MIT Press, 2008.
    Google ScholarFindings
  • C Rosen and N Nilsson. Application of intelligent automata to reconnaissance. Technical report, SRI, 1968.
    Google ScholarFindings
  • Charles Thorpe, Martial H Hebert, Takeo Kanade, and Steven A Shafer. Vision and navigation for the CarnegieMellon Navlab. TPAMI, 1988.
    Google ScholarLocate open access versionFindings
  • Jeff Michels, Ashutosh Saxena, and Andrew Y Ng. High speed obstacle avoidance using monocular vision and reinforcement learning. In ICML, 2005.
    Google ScholarLocate open access versionFindings
  • Raia Hadsell, Pierre Sermanet, Jan Ben, Ayse Erkan, Marco Scoffier, Koray Kavukcuoglu, Urs Muller, and Yann LeCun. Learning long-range vision for autonomous off-road driving. Journal of Field Robotics, 2009.
    Google ScholarLocate open access versionFindings
  • Abhinav Valada, Johan Vertens, Ankit Dhall, and Wolfram Burgard. Adapnet: Adaptive semantic segmentation in adverse environmental conditions. In ICRA, 2017.
    Google ScholarLocate open access versionFindings
  • Noriaki Hirose, Amir Sadeghian, Marynel Vazquez, Patrick Goebel, and Silvio Savarese. Gonet: A semisupervised deep learning approach for traversability estimation. In IROS, 2018.
    Google ScholarLocate open access versionFindings
  • Gregory Kahn, Adam Villaflor, Bosen Ding, Pieter Abbeel, and Sergey Levine. Self-supervised deep reinforcement learning with generalized computation graphs for robot navigation. In ICRA, 2018.
    Google ScholarLocate open access versionFindings
  • Gregory Kahn, Adam Villaflor, Pieter Abbeel, and Sergey Levine. Composable Action-Conditioned Predictors: Flexible Off-Policy Learning for Robot Navigation. In CoRL, 2018.
    Google ScholarLocate open access versionFindings
  • Jonathan P How, Brett Behihke, Adrian Frank, Daniel Dale, and John Vian. Real-time indoor autonomous vehicle test environment. IEEE Control Systems Magazine, 2008.
    Google ScholarLocate open access versionFindings
  • Shaojie Shen, Nathan Michael, and Vijay Kumar. Autonomous multi-floor indoor navigation with a computationally constrained MAV. In ICRA, 2011.
    Google ScholarLocate open access versionFindings
  • Liam Paull, Jacopo Tani, Heejin Ahn, Javier AlonsoMora, Luca Carlone, Michal Cap, Yu Fan Chen, Changhyun Choi, Jeff Dusek, Yajun Fang, et al. Duckietown: an open, inexpensive and flexible platform for autonomy education and research. In ICRA, 2017.
    Google ScholarLocate open access versionFindings
  • Sebastian Thrun, Mike Montemerlo, Hendrik Dahlkamp, David Stavens, Andrei Aron, James Diebel, Philip Fong, John Gale, Morgan Halpenny, Gabriel Hoffmann, et al. Stanley: The robot that won the darpa grand challenge. Journal of field Robotics, 2006.
    Google ScholarLocate open access versionFindings
  • Sebastian Scherer, Sanjiv Singh, Lyle Chamberlain, and Mike Elgersma. Flying fast and low among obstacles: Methodology and experiments. The International Journal of Robotics Research, 2008.
    Google ScholarLocate open access versionFindings
  • Chris Urmson and et. al. Autonomous driving in urban environments: Boss and the urban challenge. Journal of Field Robotics, 2008.
    Google ScholarLocate open access versionFindings
  • Paul Furgale and Timothy D Barfoot. Visual teach and repeat for long-range rover autonomy. Journal of Field Robotics, 2010.
    Google ScholarLocate open access versionFindings
  • Waymo. URL https://waymo.com.
    Findings
  • Skydio. URL https://www.skydio.com.
    Findings
  • Jorge Fuentes-Pacheco, Jose Ruiz-Ascencio, and Juan Manuel Rendon-Mancha. Visual simultaneous localization and mapping: a survey. Artificial Intelligence Review, 2015.
    Google ScholarLocate open access versionFindings
  • Huan Fu, Mingming Gong, Chaohui Wang, Kayhan Batmanghelich, and Dacheng Tao. Deep ordinal regression network for monocular depth estimation. In CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Jia-Ren Chang and Yong-Sheng Chen. Pyramid stereo matching network. In CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Yan Wang, Wei-Lun Chao, Divyansh Garg, Bharath Hariharan, Mark Campbell, and Kilian Q Weinberger. Pseudo-lidar from visual depth estimation: Bridging the gap in 3d object detection for autonomous driving. In CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Charles Richter and Nick Roy. Safe Visual Navigation via Deep Learning and Novelty Detection. In RSS, 2017.
    Google ScholarLocate open access versionFindings
  • Lorenz Wellhausen, Alexey Dosovitskiy, Rene Ranftl, Krzysztof Walas, Cesar Cadena, and Marco Hutter. Where should I walk? Predicting terrain properties from images via self-supervised learning. RA-L, 2019.
    Google ScholarLocate open access versionFindings
  • Stephane Ross, Narek Melik-Barkhudarov, Kumar Shaurya Shankar, Andreas Wendel, Debadeepta Dey, J Andrew Bagnell, and Martial Hebert. Learning monocular reactive uav control in cluttered natural environments. In ICRA, 2013.
    Google ScholarLocate open access versionFindings
  • Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, et al. End to end learning for self-driving cars. arXiv:1604.07316, 2016.
    Findings
  • Felipe Codevilla, Matthias Miiller, Antonio Lopez, Vladlen Koltun, and Alexey Dosovitskiy. End-to-end driving via conditional imitation learning. In ICRA, 2018.
    Google ScholarLocate open access versionFindings
  • Fereshteh Sadeghi and Sergey Levine. (CAD)2 RL: Real Single-Image Flight without a Single Real Image. RSS, 2017.
    Google ScholarLocate open access versionFindings
  • Nikolay Savinov, Alexey Dosovitskiy, and Vladlen Koltun. Semi-parametric topological memory for navigation. In ICLR, 2018.
    Google ScholarLocate open access versionFindings
  • Jake Bruce, Niko Sunderhauf, Piotr Mirowski, Raia Hadsell, and Michael Milford. Learning deployable navigation policies at kilometer scale from a single traversal. In CoRL, 2018.
    Google ScholarLocate open access versionFindings
  • Xiangyun Meng, Nathan Ratliff, Yu Xiang, and Dieter Fox. Neural Autonomous Navigation with Riemannian Motion Policy. In ICRA, 2019.
    Google ScholarLocate open access versionFindings
  • Hao-Tien Lewis Chiang, Aleksandra Faust, Marek Fiser, and Anthony Francis. Learning navigation behaviors endto-end with autorl. RA-L, 2019.
    Google ScholarLocate open access versionFindings
  • Antonio Loquercio, Ana I Maqueda, Carlos R DelBlanco, and Davide Scaramuzza. Dronet: Learning to fly by driving. In RA-L, 2018.
    Google ScholarLocate open access versionFindings
  • Noriaki Hirose, Fei Xia, Roberto Martın-Martın, Amir Sadeghian, and Silvio Savarese. Deep Visual MPCPolicy Learning for Navigation. In RA-L, 2019.
    Google ScholarLocate open access versionFindings
  • Dhiraj Gandhi, Lerrel Pinto, and Abhinav Gupta. Learning to fly by crashing. In IROS, 2017.
    Google ScholarLocate open access versionFindings
  • Martin Riedmiller, Mike Montemerlo, and Hendrik Dahlkamp. Learning to drive a real car in 20 minutes. In FBIT, 2007.
    Google ScholarLocate open access versionFindings
  • A Rupam Mahmood, Dmytro Korenkevych, Gautham Vasan, William Ma, and James Bergstra. Benchmarking reinforcement learning algorithms on real-world robots. In CoRL, 2018.
    Google ScholarLocate open access versionFindings
  • Alex Kendall, Jeffrey Hawke, David Janz, Przemyslaw Mazur, Daniele Reda, John-Mark Allen, Vinh-Dieu Lam, Alex Bewley, and Amar Shah. Learning to drive in a day. In ICRA, 2019.
    Google ScholarLocate open access versionFindings
  • Sepp Hochreiter and Jurgen Schmidhuber. Long shortterm memory. Neural computation, 1997.
    Google ScholarLocate open access versionFindings
  • Anusha Nagabandi, Kurt Konoglie, Sergey Levine, and Vikash Kumar. Deep Dynamics Models for Learning Dexterous Manipulation. In CoRL, 2019.
    Google ScholarLocate open access versionFindings
  • Reuven Y Rubinstein and Dirk P Kroese. The crossentropy method: a unified approach to combinatorial optimization, Monte-Carlo simulation and machine learning. Springer Science & Business Media, 2013.
    Google ScholarFindings
  • Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In CVPR, 2009.
    Google ScholarLocate open access versionFindings
  • Matteo Hessel, Joseph Modayil, Hado Van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, and David Silver. Rainbow: Combining improvements in deep reinforcement learning. In AAAI, 2018.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments