AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We study the problem of learning vision-based dynamic manipulation skills using a scalable reinforcement learning approach

Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation.

CoRL, pp.651-673, (2018)

Cited by: 106|Views282
EI
Full Text
Bibtex
Weibo

Abstract

In this paper, we study the problem of learning vision-based dynamic manipulation skills using a scalable reinforcement learning approach. We study this problem in the context of grasping, a longstanding challenge in robotic manipulation. In contrast to static learning behaviors that choose a grasp point and then execute the desired grasp...More

Code:

Data:

0
Introduction
  • Manipulation with object interaction represents one of the largest open problems in robotics: intelligently interacting with previously unseen objects in open-world environments requires generalizable perception, closed-loop vision-based control, and dexterous manipulation.
  • While grasping restricts the manipulation problem, it still retains many of its largest challenges: a grasping system should be able to pick up previously unseen objects with reliable and effective grasps, while using realistic sensing and actuation
  • It serves as a microcosm of the larger robotic manipulation problem, providing a challenging and practically applicable model problem for experimenting with generalization and diverse object interaction.
  • This stands in contrast to the kinds of grasping behaviors observed in humans and animals, where the grasp is a dynamical process that
Highlights
  • Manipulation with object interaction represents one of the largest open problems in robotics: intelligently interacting with previously unseen objects in open-world environments requires generalizable perception, closed-loop vision-based control, and dexterous manipulation
  • We show that our method attains a high success rate across a range of objects not seen during training, and our qualitative experiments show that this high success rate is due to the system adopting a variety of strategies that would be infeasible without closed-loop vision-based control: the learned policies exhibit corrective behaviors, regrasping, probing motions to ascertain the best grasp, non-prehensile repositioning of objects, and other features that are feasible only when grasping is formulated as a dynamic, closed-loop process
  • Our experiments evaluate our learned closed-loop vision-based grasping system to answer the following research questions: (1) How does our method perform, quantitatively, on new objects that were never seen during training? (2) How does its performance compare to a previously proposed self-supervised grasping system that does not explicitly optimize for long-horizon grasp success? (3) What types of manipulation strategies does our method adopt, and does it carry out meaningful, goal-directed pre-grasp manipulations? (4) How do the various design choices in our method affect its performance? The first two questions are addressed through a set of rigorous real-world quantitative experiments, which we discuss in Section 6.1, question (3) is addressed through qualitative experiments, which are discussed in Section 6.2 and shown in the supplementary video and online, and the last question is addressed through a detailed set of ablation studies in both simulation and the real world, which are discussed in Appendix C and A
  • The physical setup for each robot is shown in Fig. 1: the robots are tasked with grasping objects in a bin, using an over-the-shoulder RGB camera and no other sensing
  • We presented a framework for scalable robotic reinforcement learning with raw sensory inputs such as images, based on an algorithm called QT-Opt, a distributed optimization framework, and a combination of off-policy and on-policy training
  • Our results demonstrate that reinforcement learning with vision-based inputs can scale to large datasets and very large models, and can enable policies that generalize effectively for complex real-world tasks such as grasping
Methods
  • In IEEE International Conference on Robotics and Automation, 2018.

    [21] S.
  • In IEEE International Conference on Robotics and Automation, 2018.
  • End-to-end Training of Deep Visuomotor Policies.
  • Journal of Machine Learning Research, 17(39), 2016
Results
  • The authors' experiments evaluate the learned closed-loop vision-based grasping system to answer the following research questions: (1) How does the method perform, quantitatively, on new objects that were never seen during training? (2) How does its performance compare to a previously proposed self-supervised grasping system that does not explicitly optimize for long-horizon grasp success? (3) What types of manipulation strategies does the method adopt, and does it carry out meaningful, goal-directed pre-grasp manipulations? (4) How do the various design choices in the method affect its performance? The first two questions are addressed through a set of rigorous real-world quantitative experiments, which the authors discuss in Section 6.1, question (3) is addressed through qualitative experiments, which are discussed in Section 6.2 and shown in the supplementary video and online, and the last question is addressed through a detailed set of ablation studies in both simulation and the real world, which are discussed in Appendix C and A.
  • The authors' experiments evaluate the learned closed-loop vision-based grasping system to answer the following research questions: (1) How does the method perform, quantitatively, on new objects that were never seen during training?
  • The authors use two separate evaluation protocols, which use challenging objects that were not seen at training time.
  • A policy may choose to grasp the same object multiple times, the authors found in practice that each robot made grasp attempts on a variety of objects, without fixating on a single one.
Conclusion
  • The authors presented a framework for scalable robotic reinforcement learning with raw sensory inputs such as images, based on an algorithm called QT-Opt, a distributed optimization framework, and a combination of off-policy and on-policy training
  • The authors apply this framework to the task of grasping, learning closed-loop vision-based policies that attain a high success rate on previously unseen objects, and exhibit sophisticated and intelligent closed-loop behavior, including singulation and pregrasp manipulation, regrasping, and dynamic responses to disturbances.
  • The authors' framework is generic with respect to the task, and extending the approach to other manipulation skills would be an exciting direction for future work
Tables
  • Table1: Quantitative results in terms of grasp success rate on test objects. Policies are evaluated with object replacement (test) and without (bin emptying), with the latter showing success rates on the first 10, 20, and 30 grasps. The variant of our method that uses on-policy finetuning has a failure rate more than four times lower than prior work on the test set, while using substantially fewer grasp attempts for training. The variant that only uses off-policy training also substantially exceeds the performance of the prior method
  • Table2: Off-policy ablation over state representation
  • Table3: Off-policy ablation over discount and reward
  • Table4: Off-policy and on-policy ablation of termination condition
  • Table5: Off-policy performance with and without clipped Double-Q Learning
  • Table6: Data efficiency
  • Table7: Simulation studies for tuning grasping task parameters
  • Table8: Data efficiency comparison in simulation
  • Table9: Off-policy performance in simulation on datasets must the data distribution have, and which with different distributional properties of actions over states exploration policies might give rise to that distribution? To that end, we collected two dataset in simulation. The Dscripted dataset was collected by running a randomized scripted policy πscripted, discussed in Appendix B, which averaged 30% grasp success. The second dataset Dexplore was collected by running a suboptimal QT-Opt policy πeval which is also averaged 30%
Download tables as Excel
Related work
  • Reinforcement learning has been applied in the context of robotic control using both lowdimensional [1, 2] and high-dimensional [15, 16] function approximators, including with visual inputs [21, 3]. However, all of these methods focus on learning narrow, individual tasks, and do not evaluate on broad generalization to large numbers of novel test objects. Real-world robotic manipulation requires broad generalization, and indeed much of the research on robotic grasping has sought to achieve such generalization, either through the use of grasp metrics based on first principles [22] or learning [23, 10], with the latter class of methods achieving some of the best results in recent years [8, 7]. However, current grasping systems typically approach the grasping task as the problem of predicting a grasp pose, where the system looks at the scene (typically using a depth camera), chooses the best location at which to grasp, and then executes an open-loop planner to reach that location [5, 6, 7, 8]. In contrast, our approach uses reinforcement learning with deep neural networks, which enables dynamic closed-loop control. This allows our policies to perform pre-grasp manipulation and respond to dynamic disturbances and, crucially, allows us to learn grasping in a generic framework that makes minimal assumptions about the task.
Funding
  • We show that even fully off-policy training can outperform strong baselines based on prior work, while a moderate amount of on-policy finetuning can improve performance to a success rate of 96% on challenging, previously unseen objects
Study subjects and analysis
training workers: 10
Training workers pull labeled transitions from the training buffer randomly and use them to update the Q-function. We use 10 training workers, each of which compute gradients which are sent asynchronously to parameter servers. We found empirically that a large number of gradient steps (up to 15M) were needed to train an effective Q-function due to the complexity of the task and large size of the dataset and model

Reference
  • J. Peters and S. Schaal. Reinforcement learning of motor skills with policy gradients. Neural Networks, 21(4):682 – 697, 2008. ISSN 0893-6080. Robotics and Neuroscience.
    Google ScholarLocate open access versionFindings
  • M. Kalakrishnan, L. Righetti, P. Pastor, and S. Schaal. Learning Force Control Policies for Compliant Manipulation. In IEEE/RSJ International Conference on Intelligent Robots and Systems, 2011.
    Google ScholarLocate open access versionFindings
  • A. Yahya, A. Li, M. Kalakrishnan, Y. Chebotar, and S. Levine. Collective robot reinforcement learning with distributed asynchronous guided policy search. In IEEE/RSJ International Conference on Intelligent Robots and Systems, 2017.
    Google ScholarLocate open access versionFindings
  • A. Ghadirzadeh, A. Maki, D. Kragic, and M. Bjrkman. Deep predictive policy training using reinforcement learning. In IEEE/RSJ International Conference on Intelligent Robots and Systems, 2017.
    Google ScholarLocate open access versionFindings
  • A. Zeng, S. Song, S. Welker, J. Lee, A. Rodriguez, and T. Funkhouser. Learning Synergies between Pushing and Grasping with Self-supervised Deep Reinforcement Learning. arXiv preprint arXiv:1803.09956, 2018.
    Findings
  • D. Morrison et al. Cartman: The low-cost Cartesian Manipulator that won the Amazon Robotics Challenge. In IEEE International Conference on Robotics and Automation, 2018.
    Google ScholarLocate open access versionFindings
  • J. Mahler, M. Matl, X. Liu, A. Li, D. V. Gealy, and K. Goldberg. Dex-Net 3.0: Computing Robust Robot Suction Grasp Targets in Point Clouds using a New Analytic Model and Deep Learning. CoRR, abs/1709.06670, 201URL http://arxiv.org/abs/1709.06670.
    Findings
  • A. ten Pas, M. Gualtieri, K. Saenko, and R. Platt. Grasp Pose Detection in Point Clouds. The International Journal of Robotics Research, 36(13-14):1455–1473, 2017.
    Google ScholarLocate open access versionFindings
  • N. Chavan-Dafle and A. Rodriguez. Stable Prehensile Pushing: In-Hand Manipulation with Alternating Sticking Contacts. In IEEE Intl Conference on Robotics and Automation, 2018.
    Google ScholarLocate open access versionFindings
  • J. Bohg, A. Morales, T. Asfour, and D. Kragic. Data-Driven Grasp Synthesis A Survey. IEEE Transactions on Robotics, 30(2):289–309, 2014.
    Google ScholarLocate open access versionFindings
  • R. S. Sutton and A. G. Barto. Introduction to Reinforcement Learning. MIT Press, Cambridge, MA, USA, 1st edition, 1998. ISBN 0262193981.
    Google ScholarFindings
  • G. Tesauro. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play. Neural Computation, March 1994.
    Google ScholarLocate open access versionFindings
  • M. C. Machado, M. G. Bellemare, E. Talvitie, J. Veness, M. J. Hausknecht, and M. Bowling. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents. CoRR, abs/1709.06009, 2017.
    Findings
  • G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba. Openai gym, 2016.
    Google ScholarLocate open access versionFindings
  • R. Hafner and M. Riedmiller. Reinforcement learning in feedback control. Machine Learning, 84(1-2), 2011.
    Google ScholarLocate open access versionFindings
  • T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra. Continuous Control with Deep Reinforcement Learning. CoRR, abs/1509.02971, 2015. URL http://arxiv.org/abs/1509.02971.
    Findings
  • Y. Duan, X. Chen, R. Houthooft, J. Schulman, and P. Abbeel. Benchmarking Deep Reinforcement Learning for Continuous Control. In Intl Conference on Machine Learning, 2016.
    Google ScholarLocate open access versionFindings
  • P. Henderson, R. Islam, P. Bachman, J. Pineau, D. Precup, and D. Meger. Deep Reinforcement Learning that Matters. CoRR, 2017. URL http://arxiv.org/abs/1709.06560.
    Findings
  • G. Kahn, A. Villaflor, B. Ding, P. Abbeel, and S. Levine. Self-Supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot Navigation. In IEEE International Conference on Robotics and Automation, 2018.
    Google ScholarLocate open access versionFindings
  • D. Quillen, E. Jang, O. Nachum, C. Finn, J. Ibarz, and S. Levine. Deep Reinforcement Learning for Vision-Based Robotic Grasping: A Simulated Comparative Evaluation of Off-Policy Methods. In IEEE International Conference on Robotics and Automation, 2018.
    Google ScholarLocate open access versionFindings
  • S. Levine, C. Finn, T. Darrell, and P. Abbeel. End-to-end Training of Deep Visuomotor Policies. Journal of Machine Learning Research, 17(39), 2016.
    Google ScholarLocate open access versionFindings
  • J. Weisz and P. K. Allen. Pose Error Robust Grasping from Contact Wrench Space Metrics. In IEEE International Conference on Robotics and Automation, 2012.
    Google ScholarLocate open access versionFindings
  • I. Lenz, H. Lee, and A. Saxena. Deep Learning for Detecting Robotic Grasps. The International Journal of Robotics Research, 34(4-5):705–724, 2015.
    Google ScholarLocate open access versionFindings
  • K. Yu and A. Rodriguez. Realtime State Estimation with Tactile and Visual sensing. Application to Planar Manipulation. In IEEE Intl Conference on Robotics and Automation, 2018.
    Google ScholarLocate open access versionFindings
  • U. Viereck, A. ten Pas, K. Saenko, and R. Platt. Learning a visuomotor controller for real world robotic grasping using simulated depth images. In CoRL, 2017.
    Google ScholarLocate open access versionFindings
  • K. Hausman, Y. Chebotar, O. Kroemer, G. S. Sukhatme, and S. Schaal. Regrasping using Tactile Perception and Supervised Policy Learning. In AAAI Symposium on Interactive MultiSensory Object Perception for Embodied Agents, 2017.
    Google ScholarFindings
  • S. Levine, P. Pastor, A. Krizhevsky, and D. Quillen. Learning hand-eye coordination for robotic grasping with large-scale data collection. In International Symposium on Experimental Robotics, 2016.
    Google ScholarLocate open access versionFindings
  • L. Pinto and A. Gupta. Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours. In IEEE International Conference on Robotics and Automation, 2016.
    Google ScholarLocate open access versionFindings
  • D. Morrison, P. Corke, and J. Leitner. Closing the Loop for Robotic Grasping: A Real-time, Generative Grasp Synthesis Approach. In Robotics: Science and Systems, 2018.
    Google ScholarLocate open access versionFindings
  • V. Mnih et al. Human-level control through deep reinforcement learning. Nature, 518(7540): 529–533, 2015.
    Google ScholarLocate open access versionFindings
  • S. Gu, T. Lillicrap, I. Sutskever, and S. Levine. Continuous Deep Q-learning with Model-based Acceleration. In Proceedings of Intl Conference on Machine Learning, 2016.
    Google ScholarLocate open access versionFindings
  • B. T. Polyak and A. B. Juditsky. Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization, 30(4):838–855, 1992.
    Google ScholarLocate open access versionFindings
  • T. Lillicrap, J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra. Continuous control with deep reinforcement learning. In International Conference on Learning Representations, 2016.
    Google ScholarLocate open access versionFindings
  • H. V. Hasselt. Double Q-learning. In J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems, 2010.
    Google ScholarLocate open access versionFindings
  • H. v. Hasselt, A. Guez, and D. Silver. Deep Reinforcement Learning with Double Q-Learning. In AAAI Conference on Artificial Intelligence, 2016.
    Google ScholarLocate open access versionFindings
  • S. Fujimoto, H. van Hoof, and D. Meger. Addressing Function Approximation Error in ActorCritic Methods. CoRR, 2018. URL http://arxiv.org/abs/1802.09477.
    Findings
  • B. Amos, L. Xu, and J. Z. Kolter. Input convex neural networks. In International Conference on Machine Learning, volume 70, pages 146–155, 2017.
    Google ScholarLocate open access versionFindings
  • R. Rubinstein and D. Kroese. The Cross-Entropy Method. Springer-Verlag, 2004.
    Google ScholarFindings
  • E. Coumans and Y. Bai. Pybullet, a python module for physics simulation for games, robotics and machine learning. http://pybullet.org, 2016–2018.
    Findings
  • A. X. Chang, T. A. Funkhouser, L. J. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, J. Xiao, L. Yi, and F. Yu. Shapenet: An information-rich 3d model repository. CoRR, abs/1512.03012, 2015. URL http://arxiv.org/abs/1512.03012.
    Findings
  • S. Ioffe and C. Szegedy. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In International Conference on Machine Learning, 2015.
    Google ScholarLocate open access versionFindings
  • A. Stooke and P. Abbeel. Accelerated Methods for Deep Reinforcement Learning. CoRR, abs/1803.02811, 2018. URL http://arxiv.org/abs/1803.02811.
    Findings
  • L. Espeholt, H. Soyer, R. Munos, K. Simonyan, V. Mnih, T. Ward, Y. Doron, V. Firoiu, T. Harley, I. Dunning, S. Legg, and K. Kavukcuoglu. IMPALA: scalable distributed deeprl with importance weighted actor-learner architectures. CoRR, abs/1802.01561, 2018. URL http://arxiv.org/abs/1802.01561.
    Findings
  • A. Nair, P. Srinivasan, S. Blackwell, C. Alcicek, R. Fearon, A. D. Maria, V. Panneershelvam, M. Suleyman, C. Beattie, S. Petersen, S. Legg, V. Mnih, K. Kavukcuoglu, and D. Silver. Massively parallel methods for deep reinforcement learning. CoRR, abs/1507.04296, 2015. URL http://arxiv.org/abs/1507.04296.
    Findings
  • D. Horgan, J. Quan, D. Budden, G. Barth-Maron, M. Hessel, H. van Hasselt, and D. Silver. Distributed Prioritized Experience Replay. In International Conference on Learning Representations, 2018.
    Google ScholarLocate open access versionFindings
  • J. Dean, G. S. Corrado, R. Monga, K. Chen, M. Devin, Q. V. Le, M. Z. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, and A. Y. Ng. Large scale distributed deep networks. In Advances in Neural Information Processing Systems, 2012.
    Google ScholarLocate open access versionFindings
  • M. Abadi et al. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. URL https://www.tensorflow.org/.
    Findings
Author
Dmitry Kalashnikov
Dmitry Kalashnikov
Julian Ibarz
Julian Ibarz
Alexander Herzog
Alexander Herzog
Deirdre Quillen
Deirdre Quillen
Ethan Holly
Ethan Holly
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科