Learning to Coordinate Manipulation Skills via Skill Behavior Diversification

Youngwoon Lee
Youngwoon Lee

ICLR, 2020.

Cited by: 0|Bibtex|Views14|Links
EI
Keywords:
reinforcement learning hierarchical reinforcement learning modular framework skill coordination bimanual manipulation
Weibo:
The experimental results on robotic manipulation and locomotion tasks demonstrate that the proposed framework is able to efficiently learn primitive skills with diverse behaviors and coordinate multiple agents to solve challenging cooperative control tasks

Abstract:

When mastering a complex manipulation task, humans often decompose the task into sub-skills of their body parts, practice the sub-skills independently, and then execute the sub-skills together. Similarly, a robot with multiple end-effectors can perform complex tasks by coordinating sub-skills of each end-effector. To realize temporal and ...More
0
Introduction
  • Imagine you wish to play Chopin’s Fantaisie Impromptu on the piano. With little prior knowledge about the piece, you would first practice playing the piece with each hand separately.
  • Instead of learning a task at once, modular approaches (Andreas et al, 2017; Oh et al, 2017; Frans et al, 2018; Lee et al, 2019; Peng et al, 2019; Goyal et al, 2020) suggest to learn reusable primitive skills and solve more complex tasks by recombining the skills
  • All these approaches either focus on working with single end-effector manipulation or single agent locomotion, and these do not scale to multi-agent problems
Highlights
  • Imagine you wish to play Chopin’s Fantaisie Impromptu on the piano
  • We propose a modular framework that learns to coordinate multiple end-effectors with their primitive skills for various robotics tasks, such as bimanual manipulation
  • We propose a modular framework with skill coordination to tackle challenges of composition of sub-skills with multiple agents
  • To coordinate learned primitive skills, the meta policy predicts not only the skill to execute for each agent but the behavior embedding that controls the chosen primitive skill’s behavior
  • The experimental results on robotic manipulation and locomotion tasks demonstrate that the proposed framework is able to efficiently learn primitive skills with diverse behaviors and coordinate multiple agents to solve challenging cooperative control tasks
Methods
  • The authors address the problem of solving cooperative manipulation tasks that require collaboration between multiple end-effectors or agents.
  • The authors propose a modular and hierarchical framework that learns to coordinate multiple agents with primitive skills to perform a complex task.
  • During primitive skill training, the authors propose to learn a latent behavior embedding, which provides controllability of each primitive skill to the meta policy while coordinating skills.
  • The authors describe how the meta policy learns to coordinate primitive skills in Section 3.4
Results
  • The authors' method achieves 32.3% success rate on ANT PUSH task while all baselines fail to compose primitive skills as shown in Figure 5c and Table 1.
Conclusion
  • The authors propose a modular framework with skill coordination to tackle challenges of composition of sub-skills with multiple agents.
  • The authors use entropy maximization with mutual information maximization to train controllable primitive skills with diverse behaviors.
  • The experimental results on robotic manipulation and locomotion tasks demonstrate that the proposed framework is able to efficiently learn primitive skills with diverse behaviors and coordinate multiple agents to solve challenging cooperative control tasks.
  • Acquiring skills without supervision and extending the method to a visual domain are exciting directions for future work
Summary
  • Introduction:

    Imagine you wish to play Chopin’s Fantaisie Impromptu on the piano. With little prior knowledge about the piece, you would first practice playing the piece with each hand separately.
  • Instead of learning a task at once, modular approaches (Andreas et al, 2017; Oh et al, 2017; Frans et al, 2018; Lee et al, 2019; Peng et al, 2019; Goyal et al, 2020) suggest to learn reusable primitive skills and solve more complex tasks by recombining the skills
  • All these approaches either focus on working with single end-effector manipulation or single agent locomotion, and these do not scale to multi-agent problems
  • Methods:

    The authors address the problem of solving cooperative manipulation tasks that require collaboration between multiple end-effectors or agents.
  • The authors propose a modular and hierarchical framework that learns to coordinate multiple agents with primitive skills to perform a complex task.
  • During primitive skill training, the authors propose to learn a latent behavior embedding, which provides controllability of each primitive skill to the meta policy while coordinating skills.
  • The authors describe how the meta policy learns to coordinate primitive skills in Section 3.4
  • Results:

    The authors' method achieves 32.3% success rate on ANT PUSH task while all baselines fail to compose primitive skills as shown in Figure 5c and Table 1.
  • Conclusion:

    The authors propose a modular framework with skill coordination to tackle challenges of composition of sub-skills with multiple agents.
  • The authors use entropy maximization with mutual information maximization to train controllable primitive skills with diverse behaviors.
  • The experimental results on robotic manipulation and locomotion tasks demonstrate that the proposed framework is able to efficiently learn primitive skills with diverse behaviors and coordinate multiple agents to solve challenging cooperative control tasks.
  • Acquiring skills without supervision and extending the method to a visual domain are exciting directions for future work
Tables
  • Table1: Success rates for all tasks, comparing our method against baselines. Each entry in the table represents average success rate and standard deviation over 100 runs. The baselines learning from scratch fail to learn complex tasks with multiple agents
  • Table2: Environment details
  • Table3: Hyperparameters
Download tables as Excel
Related work
  • Deep reinforcement learning (RL) for continuous control is an active research area. However, learning a complex task either from a sparse reward or a heavily engineered reward becomes computationally impractical as the target task becomes complicated. Instead of learning from scratch, complex tasks can be tackled by decomposing the tasks into easier and reusable sub-tasks. Hierarchical reinforcement learning temporally splits a task into a sequence of temporally extended meta actions. It often consists of one meta policy (high-level policy) and a set of low-level policies, such as options framework (Sutton et al, 1999). The meta policy decides which low-level policy to activate and the chosen low-level policy generates an action sequence until the meta policy switches it to another lowlevel policy. Options can be discovered without supervision (Schmidhuber, 1990; Bacon et al, 2017; Nachum et al, 2018; Levy et al, 2019), meta-learned (Frans et al, 2018), pre-defined (Kulkarni et al, 2016; Oh et al, 2017; Merel et al, 2019; Lee et al, 2019), or attained from additional supervision signals (Andreas et al, 2017; Ghosh et al, 2018). However, option frameworks are not flexible to solve a task that requires simultaneous activation or interpolation of multiple skills since only one skill can be activated at each time step.
Funding
  • This project was funded by SKT
Reference
  • Jacob Andreas, Dan Klein, and Sergey Levine. Modular multitask reinforcement learning with policy sketches. In International Conference on Machine Learning, pp. 166–175, 2017.
    Google ScholarLocate open access versionFindings
  • Pierre-Luc Bacon, Jean Harb, and Doina Precup. The option-critic architecture. In Association for the Advancement of Artificial Intelligence, 2017.
    Google ScholarLocate open access versionFindings
  • Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. Openai gym. arXiv preprint arXiv:1606.01540, 2016.
    Findings
  • Prafulla Dhariwal, Christopher Hesse, Oleg Klimov, Alex Nichol, Matthias Plappert, Alec Radford, John Schulman, Szymon Sidor, Yuhuai Wu, and Peter Zhokhov. Openai baselines. https://github.com/openai/baselines, 2017.
    Findings
  • Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, and Sergey Levine. Diversity is all you need: Learning skills without a reward function. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=SJx63jRqFm.
    Locate open access versionFindings
  • Jakob N Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, and Shimon Whiteson. Counterfactual multi-agent policy gradients. In Association for the Advancement of Artificial Intelligence, 2018.
    Google ScholarLocate open access versionFindings
  • Kevin Frans, Jonathan Ho, Xi Chen, Pieter Abbeel, and John Schulman. META LEARNING SHARED HIERARCHIES. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=SyX0IeWAW.
    Locate open access versionFindings
  • Dibya Ghosh, Avi Singh, Aravind Rajeswaran, Vikash Kumar, and Sergey Levine. Divide-andconquer reinforcement learning. In International Conference on Learning Representations, 201URL https://openreview.net/forum?id=rJwelMbR-.
    Locate open access versionFindings
  • Anirudh Goyal, Shagun Sodhani, Jonathan Binas, Xue Bin Peng, Sergey Levine, and Yoshua Bengio. Reinforcement learning with competitive ensembles of information-constrained primitives. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=ryxgJTEYDr.
    Locate open access versionFindings
  • Jayesh K Gupta, Maxim Egorov, and Mykel Kochenderfer. Cooperative multi-agent control using deep reinforcement learning. In International Conference on Autonomous Agents and Multi-Agent Systems, pp. 66–83, 2017.
    Google ScholarLocate open access versionFindings
  • Tuomas Haarnoja, Haoran Tang, Pieter Abbeel, and Sergey Levine. Reinforcement learning with deep energy-based policies. In International Conference on Machine Learning, pp. 1352–1361, 2017.
    Google ScholarLocate open access versionFindings
  • Tuomas Haarnoja, Vitchyr Pong, Aurick Zhou, Murtaza Dalal, Pieter Abbeel, and Sergey Levine. Composable deep reinforcement learning for robotic manipulation. In IEEE International Conference on Robotics and Automation, pp. 6244–6251, 2018a.
    Google ScholarLocate open access versionFindings
  • Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning, pp. 1856–1865, 2018b.
    Google ScholarLocate open access versionFindings
  • Jiechuan Jiang and Zongqing Lu. Learning attentional communication for multi-agent cooperation. In Advances in Neural Information Processing Systems, pp. 7254–7264, 2018.
    Google ScholarLocate open access versionFindings
  • Tejas D Kulkarni, Karthik Narasimhan, Ardavan Saeedi, and Josh Tenenbaum. Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. In Advances in Neural Information Processing Systems, pp. 3675–3683, 2016.
    Google ScholarLocate open access versionFindings
  • Youngwoon Lee, Shao-Hua Sun, Sriram Somasundaram, Edward Hu, and Joseph J. Lim. Composing complex skills by learning transition policies. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=rygrBhC5tQ.
    Locate open access versionFindings
  • Andrew Levy, Robert Platt, and Kate Saenko. Hierarchical reinforcement learning with hindsight. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=ryzECoAcY7.
    Locate open access versionFindings
  • Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, and Igor Mordatch. Multi-agent actor-critic for mixed cooperative-competitive environments. In Advances in Neural Information Processing Systems, pp. 6379–6390, 2017.
    Google ScholarLocate open access versionFindings
  • Pattie Maes and Rodney A Brooks. Learning to coordinate behaviors. In Association for the Advancement of Artificial Intelligence, volume 90, pp. 796–802, 1990.
    Google ScholarLocate open access versionFindings
  • Josh Merel, Arun Ahuja, Vu Pham, Saran Tunyasuvunakool, Siqi Liu, Dhruva Tirumala, Nicolas Heess, and Greg Wayne. Hierarchical visuomotor control of humanoids. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id= BJfYvo09Y7.
    Locate open access versionFindings
  • Ofir Nachum, Shixiang Shane Gu, Honglak Lee, and Sergey Levine. Data-efficient hierarchical reinforcement learning. In Advances in Neural Information Processing Systems, pp. 3303–3313, 2018.
    Google ScholarLocate open access versionFindings
  • Ofir Nachum, Michael Ahn, Hugo Ponte, Shixiang Gu, and Vikash Kumar. Multi-agent manipulation via locomotion using hierarchical sim2real. In Conference on Robot Learning, 2019.
    Google ScholarLocate open access versionFindings
  • Junhyuk Oh, Satinder Singh, Honglak Lee, and Pushmeet Kohli. Zero-shot task generalization with multi-task deep reinforcement learning. In International Conference on Machine Learning, pp. 2661–2670, 2017.
    Google ScholarLocate open access versionFindings
  • Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in PyTorch. In Advances in Neural Information Processing Systems Autodiff Workshop, 2017.
    Google ScholarLocate open access versionFindings
  • Peng Peng, Ying Wen, Yaodong Yang, Quan Yuan, Zhenkun Tang, Haitao Long, and Jun Wang. Multiagent bidirectionally-coordinated nets: Emergence of human-level coordination in learning to play starcraft combat games. arXiv preprint arXiv:1703.10069, 2017.
    Findings
  • Xue Bin Peng, Michael Chang, Grace Zhang, Pieter Abbeel, and Sergey Levine. Mcp: Learning composable hierarchical control with multiplicative compositional policies. Advances in Neural Information Processing Systems, 2019.
    Google ScholarLocate open access versionFindings
  • Ahmed H. Qureshi, Jacob J. Johnson, Yuzhe Qin, Taylor Henderson, Byron Boots, and Michael C. Yip. Composing task-agnostic policies with deep reinforcement learning. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id= H1ezFREtwH.
    Locate open access versionFindings
  • Martin Riedmiller, Roland Hafner, Thomas Lampe, Michael Neunert, Jonas Degrave, Tom Van de Wiele, Volodymyr Mnih, Nicolas Heess, and Jost Tobias Springenberg. Learning by playing solving sparse reward tasks from scratch. In International Conference on Machine Learning, 2018.
    Google ScholarLocate open access versionFindings
  • Jurgen Schmidhuber. Towards compositional learning with dynamic neural networks. Inst. fur Informatik, 1990.
    Google ScholarLocate open access versionFindings
  • John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. High-dimensional continuous control using generalized advantage estimation. In International Conference on Learning Representations, 2016.
    Google ScholarLocate open access versionFindings
  • John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
    Findings
  • Sainbayar Sukhbaatar, Rob Fergus, et al. Learning multiagent communication with backpropagation. In Advances in Neural Information Processing Systems, pp. 2244–2252, 2016.
    Google ScholarLocate open access versionFindings
  • Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z Leibo, Karl Tuyls, et al. Value-decomposition networks for cooperative multi-agent learning based on team reward. In International Conference on Autonomous Agents and Multi-Agent Systems, pp. 2085–2087, 2018.
    Google ScholarLocate open access versionFindings
  • Published as a conference paper at ICLR 2020 Richard S Sutton. Temporal credit assignment in reinforcement learning. PhD thesis, University of
    Google ScholarFindings
  • Massachusetts, 1984. Richard S Sutton, Doina Precup, and Satinder Singh. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 112(1-2):181–211, 1999. Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics engine for model-based control. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033, 2012.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments