Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping

international conference on robotics and automation, 2018.

Cited by: 253|Bibtex|Views224|Links
EI
Keywords:
generative adversarial networksrgb imagemonocular rgbbatch normalizationdomain-specific batch normalizationMore(11+)
Weibo:
We study how randomized simulated environments and domain adaptation methods can be extended to train a grasping system to grasp novel objects from raw monocular RGB images

Abstract:

Instrumenting and collecting annotated visual grasping datasets to train modern machine learning algorithms can be extremely time-consuming and expensive. An appealing alternative is to use off-the-shelf simulators to render synthetic data for which ground-truth annotations are generated automatically. Unfortunately, models trained purely...More

Code:

Data:

0
Introduction
  • Grasping is one of the most fundamental robotic manipulation problems. For virtually any prehensile manipulation behavior, the first step is to grasp the object(s) in question.
  • Learning-based approaches introduce a major challenge: the need for large labeled datasets
  • These labels might consist of human-provided grasp points [8], or they might be collected autonomously [5], [6].
  • The authors' goal in this work is to show the effect of using simulation and domain adaptation in conjunction with a tested data-driven, monocular vision-based grasping approach.
  • The authors focus on extending the first component to include simulated data in the training set for the grasp prediction network C, leaving the other parts of the system unchanged
Highlights
  • Grasping is one of the most fundamental robotic manipulation problems
  • Our work has three main contributions. (a) Substantial improvement in grasping performance from monocular RGB images by incorporating synthetic data: We propose approaches for incorporating synthetic data into end-to-end training of vision-based robotic grasping that we show achieves substantial improvement in performance, in the lower-data and no-data regimes. (b) Detailed experimentation for simulation-to-real world transfer: Our experiments involved 25, 704 real grasps of 36 diverse test objects and consider a number of dimensions: the nature of the simulated objects, the kind of randomization used in simulation, and the domain adaptation technique used to adapt simulated images to the real world. (c) The first demonstration of effective simulation-to-real-world transfer for purely monocular vision-based grasping: To our knowledge, our work is the first to demonstrate successful simulation-to-realworld transfer for grasping, with generalization to previously unseen natural objects, using only monocular RGB images
  • The first is a grasp prediction convolutional neural network (CNN) C that accepts a tuple of visual inputs xi = {xi0, xic } and a motion command vi, and outputs the predicted probability that executing vi will result in a successful grasp. xi0 is an image recorded before the robot becomes visible and starts the grasp attempt, and xic is an image recorded at the current timestep. vi is specified in the frame of the base of the robot and corresponds to a relative change of the end-effector’s current position and rotation about the vertical axis
  • EVALUATION This section aims to answer the following research questions: (a) is the use of simulated data from a low quality simulator aiding in improving grasping performance in the real world? (b) is the improvement consistent with varying amounts of real-world labeled samples? (c) how realistic do graspable objects in simulation need to be? (d) does randomizing the virtual environment affect simulation-to-real world transfer, and what are the randomization attributes that help most? (e) does domain adaptation allow for better utilization of simulated grasping experience? In order to answer these questions, we evaluated a number of different ways for training a grasp success prediction model C with simulated data and domain adaptation1
  • We study grasping from over-the-shoulder monocular RGB images, a challenging setting where depth information and analytic 3D models are not available
  • This presents a challenging setting for simulation-to-real-world transfer, since simulated RGB images typically differ much more from real ones compared to simulated depth images
Results
  • In order to answer these questions, the authors evaluated a number of different ways for training a grasp success prediction model C with simulated data and domain adaptation1.
  • A test set consisting of the objects shown in Fig. 3c, the same used in [6], with 6 different objects in each bin for each robot
  • These objects were not included in the real-world training set and were not used in any way when creating the simulation datasets.
  • Optimal models were selected by using the accuracy of C on a held-out validation set of 94, 000 real samples
Conclusion
  • The authors examined how simulated data can be incorporated into a learning-based grasping system to improve performance and reduce data requirements.
  • The authors study grasping from over-the-shoulder monocular RGB images, a challenging setting where depth information and analytic 3D models are not available.
  • This presents a challenging setting for simulation-to-real-world transfer, since simulated RGB images typically differ much more from real ones compared to simulated depth images.
  • The authors' experiments indicate that the method can provide plausible transformations of synthetic images, and that including domain adaptation substantially improves performance in most cases
Summary
  • Introduction:

    Grasping is one of the most fundamental robotic manipulation problems. For virtually any prehensile manipulation behavior, the first step is to grasp the object(s) in question.
  • Learning-based approaches introduce a major challenge: the need for large labeled datasets
  • These labels might consist of human-provided grasp points [8], or they might be collected autonomously [5], [6].
  • The authors' goal in this work is to show the effect of using simulation and domain adaptation in conjunction with a tested data-driven, monocular vision-based grasping approach.
  • The authors focus on extending the first component to include simulated data in the training set for the grasp prediction network C, leaving the other parts of the system unchanged
  • Objectives:

    One of the aims of the work is to study how final grasping performance is affected by the 3D object models the simulated experience is based on, the scene appearance and dynamics in simulation, and the way simulated and real experience is integrated for maximal transfer.
  • Results:

    In order to answer these questions, the authors evaluated a number of different ways for training a grasp success prediction model C with simulated data and domain adaptation1.
  • A test set consisting of the objects shown in Fig. 3c, the same used in [6], with 6 different objects in each bin for each robot
  • These objects were not included in the real-world training set and were not used in any way when creating the simulation datasets.
  • Optimal models were selected by using the accuracy of C on a held-out validation set of 94, 000 real samples
  • Conclusion:

    The authors examined how simulated data can be incorporated into a learning-based grasping system to improve performance and reduce data requirements.
  • The authors study grasping from over-the-shoulder monocular RGB images, a challenging setting where depth information and analytic 3D models are not available.
  • This presents a challenging setting for simulation-to-real-world transfer, since simulated RGB images typically differ much more from real ones compared to simulated depth images.
  • The authors' experiments indicate that the method can provide plausible transformations of synthetic images, and that including domain adaptation substantially improves performance in most cases
Tables
  • Table1: The effect of our choices for simulated objects and randomization in terms of grasp success. We compared the performance of models trained jointly on grasps of procedural vs ShapeNet objects with 10% of the real data. Models were trained with DANN and DBN mixing
  • Table2: Real grasp performance when no labeled real examples are available. Method names explained in the text
  • Table3: Success of grasping 36 diverse and unseen physical objects of all our methods trained on different amounts of real-world samples and 8 million simulated samples with procedural objects. Method names are explained in the text
Download tables as Excel
Related work
  • Robotic grasping is one of the most widely explored areas of manipulation. While a complete survey of grasping is outside the scope of this work, we refer the reader to standard surveys on the subject for a more complete treatment [2]. Grasping methods can be broadly categorized into two groups: geometric methods and data-driven methods. Geometric methods employ analytic grasp metrics, such as force closure [9] or caging [10]. These methods often include appealing guarantees on performance, but typically at the expense of relatively restrictive assumptions. Practical applications of such approaches typically violate one or more of their assumptions. For this reason, data-driven grasping algorithms have risen in popularity in recent years. Instead of relying exclusively on an analytic understanding of the physics of an object, data-driven methods seek to directly predict either human-specified grasp positions [8] or empirically estimated grasp outcomes [5], [6]. A number of methods combine both ideas, for example using analytic metrics to label training data [3], [7].
Funding
  • The authors also thank Erwin Coumans, Ethan Holly, Dmitry Kalashnikov, Deirdre Quillen, and Ian Wilkes for contributions to the development of our grasping system and supporting infrastructure
Reference
  • B. Siciliano and O. Khatib, Springer Handbook of Robotics. Secaucus, NJ, USA: Springer-Verlag New York, Inc., 2007.
    Google ScholarFindings
  • J. Bohg, A. Morales, T. Asfour, and D. Kragic, “Data-driven grasp synthesis-a survey,” IEEE Transactions on Robotics, 2014.
    Google ScholarLocate open access versionFindings
  • D. Kappler, J. Bohg, and S. Schaal, “Leveraging Big Data for Grasp Planning,” in ICRA, 2015.
    Google ScholarFindings
  • U. Viereck, A. t. Pas, K. Saenko, and R. Platt, “Learning a visuomotor controller for real world robotic grasping using easily simulated depth images,” arxiv:1706.04652, 2017.
    Findings
  • P. L. and A. Gupta, “Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours,” in ICRA, 2016.
    Google ScholarFindings
  • S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, and D. Quillen, “Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection,” IJRR, 2016.
    Google ScholarLocate open access versionFindings
  • J. Mahler, J. Liang, S. Niyaz, M. Laskey, R. Doan, X. Liu, J. A. Ojea, and K. Goldberg, “Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics,” in RSS, 2017.
    Google ScholarFindings
  • I. Lenz, H. Lee, and A. Saxena, “Deep Learning for Detecting Robotic Grasps,” IJRR, 2015.
    Google ScholarLocate open access versionFindings
  • A. Bicchi, “On the Closure Properties of Robotic Grasping,” IJRR, 1995.
    Google ScholarLocate open access versionFindings
  • A. Rodriguez, M. T. Mason, and S. Ferry, “From Caging to Grasping,” IJRR, 2012.
    Google ScholarLocate open access versionFindings
  • A. Saxena, J. Driemeyer, and A. Y. Ng, “Robotic Grasping of Novel Objects using Vision,” IJRR, 2008.
    Google ScholarLocate open access versionFindings
  • M. Gualtieri, A. ten Pas, K. Saenko, and R. Platt, “High precision grasp pose detection in dense clutter,” in IROS, 2016, pp. 598–605.
    Google ScholarLocate open access versionFindings
  • J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World,” arxiv:1703.06907, 2017.
    Findings
  • S. James, A. J. Davison, and E. Johns, “Transferring End-to-End Visuomotor Control from Simulation to Real World for a Multi-Stage Task,” arxiv:1707.02267, 2017.
    Findings
  • F. Sadeghi and S. Levine, “CAD2RL: Real single-image flight without a single real image.” arxiv:1611.04201, 2016.
    Findings
  • V. M. Patel, R. Gopalan, R. Li, and R. Chellappa, “Visual domain adaptation: A survey of recent advances,” IEEE Signal Processing Magazine, vol. 32, no. 3, pp. 53–69, 2015.
    Google ScholarLocate open access versionFindings
  • G. Csurka, “Domain adaptation for visual applications: A comprehensive survey,” arxiv:1702.05374, 2017.
    Findings
  • B. Sun, J. Feng, and K. Saenko, “Return of frustratingly easy domain adaptation,” in AAAI, 2016.
    Google ScholarFindings
  • B. Gong, Y. Shi, F. Sha, and K. Grauman, “Geodesic flow kernel for unsupervised domain adaptation,” in CVPR, 2012.
    Google ScholarFindings
  • R. Caseiro, J. F. Henriques, P. Martins, and J. Batista, “Beyond the shortest path: Unsupervised Domain Adaptation by Sampling Subspaces Along the Spline Flow,” in CVPR, 2015.
    Google ScholarFindings
  • R. Gopalan, R. Li, and R. Chellappa, “Domain Adaptation for Object Recognition: An Unsupervised Approach,” in ICCV, 2011.
    Google ScholarFindings
  • Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky, “Domain-adversarial training of neural networks,” JMLR, 2016.
    Google ScholarLocate open access versionFindings
  • M. Long and J. Wang, “Learning transferable features with deep adaptation networks,” ICML, 2015.
    Google ScholarLocate open access versionFindings
  • K. Bousmalis, G. Trigeorgis, N. Silberman, D. Krishnan, and D. Erhan, “Domain separation networks,” in NIPS, 2016.
    Google ScholarFindings
  • Y. Taigman, A. Polyak, and L. Wolf, “Unsupervised cross-domain image generation,” in ICLR, 2017.
    Google ScholarFindings
  • K. Bousmalis, N. Silberman, D. Dohan, D. Erhan, and D. Krishnan, “Unsupervised pixel-level domain adaptation with generative adversarial neural networks,” in CVPR, 2017.
    Google ScholarFindings
  • A. Shrivastava, T. Pfister, O. Tuzel, J. Susskind, W. Wang, and R. Webb, “Learning from simulated and unsupervised images through adversarial training,” in CVPR, 2017.
    Google ScholarFindings
  • J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired Image-toImage Translation using Cycle-Consistent Adversarial Networks,” in ICCV, 2017.
    Google ScholarFindings
  • I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in NIPS, 2014.
    Google ScholarFindings
  • E. Coumans and Y. Bai, “pybullet, a python module for physics simulation in robotics, games and machine learning,” http://pybullet.org/, 2016–2017.
    Findings
  • A. X. Chang, T. A. Funkhouser, L. J. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, J. Xiao, L. Yi, and F. Yu, “ShapeNet: An Information-Rich 3D Model Repository,” CoRR, 2015.
    Google ScholarLocate open access versionFindings
  • S. Ioffe and C. Szegedy, “Batch Normalization: Accelerating Network Training by Reducing Internal Covariate Shift,” in ICML, 2015.
    Google ScholarFindings
  • O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” in MICCAI, 2015.
    Google ScholarLocate open access versionFindings
  • D. Ulyanov, A. Vedaldi, and V. Lempitsky, “Instance normalization: The missing ingredient for fast stylization,” arxiv:1607.08022, 2016.
    Findings
  • P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-Image Translation with Conditional Adversarial Networks,” arxiv:1611.07004, 2016.
    Findings
  • X. Mao, Q. Li, H. Xie, R. Y. Lau, Z. Wang, and S. P. Smolley, “Least Squares Generative Adversarial Networks,” arxiv:1611.04076, 2016.
    Findings
  • J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual Losses for Real-Time Style Transfer and Super-Resolution,” in ECCV. Springer, 2016.
    Google ScholarFindings
  • P. Christiano, Z. Shah, I. Mordatch, J. Schneider, T. Blackwell, J. Tobin, P. Abbeel, and W. Zaremba, “Transfer from simulation to real world through learning deep inverse dynamics model,” arXiv:1610.03518, 2016.
    Findings
  • A. Rajeswaran, S. Ghotra, S. Levine, and B. Ravindran, “Epopt: Learning robust neural network policies using model ensembles,” arxiv:1610.01283, 2016.
    Findings
  • W. Yu, C. K. Liu, and G. Turk, “Preparing for the unknown: Learning a universal policy with online system identification,” arXiv:1702.02453, 2017.
    Findings
Your rating :
0

 

Tags
Comments