Learning Agile Robotic Locomotion Skills by Imitating Animals

robotics science and systems, 2020.

Cited by: 0|Bibtex|Views35|DOI:https://doi.org/10.15607/RSS.2020.XVI.064
Other Links: arxiv.org|academic.microsoft.com
Weibo:
We show that by combining motion imitation and latent space adaptation, our system is able to learn a diverse corpus of dynamic locomotion skills that can be transferred to legged robots in the real world

Abstract:

Reproducing the diverse and agile locomotion skills of animals has been a longstanding challenge in robotics. While manually-designed controllers have been able to emulate many complex behaviors, building such controllers involves a time-consuming and difficult development process, often requiring substantial expertise of the nuances of...More

Code:

Data:

0
Introduction
  • Animals can traverse complex environments with remarkable agility, bringing to bear broad repertoires of agile and acrobatic skills.
  • The authors propose an imitation learning framework that enables legged robots to learn agile locomotion skills from real-world animals.
  • Imitating reference motions provides a general approach for robots to perform a rich variety of behaviors that would otherwise be difficult to manually encode into controllers [48, 21, 55, 63].
Highlights
  • Animals can traverse complex environments with remarkable agility, bringing to bear broad repertoires of agile and acrobatic skills
  • While these methods have demonstrated promising results in simulation, agents trained through reinforcement learning are prone to adopting unnatural behaviors that are dangerous or infeasible when deployed in the real world
  • The comparatively superior agility seen in animals, as compared to robots, might lead one to wonder: can we build more agile robotic controllers with less effort by directly imitating animal motions? In this work, we propose an imitation learning framework that enables legged robots to learn agile locomotion skills from real-world animals
  • In order to transfer policies learned in simulation to the real world, we propose a sample efficient adaptation technique, which fine-tunes the behavior of a policy using a learned dynamics representation
  • We show that by combining motion imitation and latent space adaptation, our system is able to learn a diverse corpus of dynamic locomotion skills that can be transferred to legged robots in the real world
  • To facilitate transfer to the real world, domain randomization is applied in simulation to train policies that can adapt to different dynamics
Results
  • The authors leverage a class of adaptation techniques, which the authors broadly referred to as latent space methods [24, 65, 67], to transfer locomotion policies from simulation to the real world.
  • The authors show that by combining motion imitation and latent space adaptation, the system is able to learn a diverse corpus of dynamic locomotion skills that can be transferred to legged robots in the real world.
  • It uses reinforcement learning to synthesize a policy that enables a robot to reproduce that skill in the real world.
  • To facilitate transfer to the real world, domain randomization is applied in simulation to train policies that can adapt to different dynamics.
  • 3) the policy is transferred to a real robot via a sample efficient domain adaptation process, which adapts the policy’s behavior using a learned latent dynamics representation.
  • The authors aim to evaluate the effectiveness of the framework on learning a diverse set of quadruped skills, and study how well real-world adaptation can enable more agile behaviors.
  • The authors show that the adaptation method can efficiently transfer policies trained in simulation to the real world with a small number of trials on the physical system.
  • The authors further study the effects of regularizing the latent dynamics encoding with an information bottleneck, and show that this provides a mechanism to trade off between the robustness and adaptability of the learned policies.
  • Figure 5 lists the skills learned by the robot and summarizes the performance of the policies when deployed in the real world.
Conclusion
  • The authors' framework is able to learn a diverse set of locomotion skills for the Laikago, including dynamic gaits, such as pacing and trotting, as well as agile turning and spinning motions (Figure 4).
  • To evaluate the policies’ abilities to cope with unfamiliar dynamics, the authors test the policies in out-of-distribution simulated environments, where the dynamics parameters are sampled from a larger range of values than those used during training.
  • By providing the system with different reference motions, the authors are able to learn policies for a diverse set of behaviors with a quadruped robot, which can be efficiently transferred from simulation to the real world.
Summary
  • Animals can traverse complex environments with remarkable agility, bringing to bear broad repertoires of agile and acrobatic skills.
  • The authors propose an imitation learning framework that enables legged robots to learn agile locomotion skills from real-world animals.
  • Imitating reference motions provides a general approach for robots to perform a rich variety of behaviors that would otherwise be difficult to manually encode into controllers [48, 21, 55, 63].
  • The authors leverage a class of adaptation techniques, which the authors broadly referred to as latent space methods [24, 65, 67], to transfer locomotion policies from simulation to the real world.
  • The authors show that by combining motion imitation and latent space adaptation, the system is able to learn a diverse corpus of dynamic locomotion skills that can be transferred to legged robots in the real world.
  • It uses reinforcement learning to synthesize a policy that enables a robot to reproduce that skill in the real world.
  • To facilitate transfer to the real world, domain randomization is applied in simulation to train policies that can adapt to different dynamics.
  • 3) the policy is transferred to a real robot via a sample efficient domain adaptation process, which adapts the policy’s behavior using a learned latent dynamics representation.
  • The authors aim to evaluate the effectiveness of the framework on learning a diverse set of quadruped skills, and study how well real-world adaptation can enable more agile behaviors.
  • The authors show that the adaptation method can efficiently transfer policies trained in simulation to the real world with a small number of trials on the physical system.
  • The authors further study the effects of regularizing the latent dynamics encoding with an information bottleneck, and show that this provides a mechanism to trade off between the robustness and adaptability of the learned policies.
  • Figure 5 lists the skills learned by the robot and summarizes the performance of the policies when deployed in the real world.
  • The authors' framework is able to learn a diverse set of locomotion skills for the Laikago, including dynamic gaits, such as pacing and trotting, as well as agile turning and spinning motions (Figure 4).
  • To evaluate the policies’ abilities to cope with unfamiliar dynamics, the authors test the policies in out-of-distribution simulated environments, where the dynamics parameters are sampled from a larger range of values than those used during training.
  • By providing the system with different reference motions, the authors are able to learn policies for a diverse set of behaviors with a quadruped robot, which can be efficiently transferred from simulation to the real world.
Tables
  • Table1: DYNAMIC PARAMETERS AND THEIR RESPECTIVE RANGE OF VALUES USED
  • Table2: PERFORMANCE STATISTICS OF IMITATING VARIOUS SKILLS IN SIMULATION USING THE CANONICAL DYNAMICS PARAMETERS. PERFORMANCE IN
  • Table3: HYPER-PARAMETERS USED DURING TRAINING IN
  • Table4: PERFORMANCE STATISTICS OF IMITATING VARIOUS SKILLS IN THE REAL WORLD. PERFORMANCE IS RECORDED AS THE AVERAGE NORMALIZED RETURN BETWEEN [0, 1]. THREE POLICIES INITIALIZED WITH DIFFERENT RANDOM SEEDS ARE TRAINED FOR EACH COMBINATION OF SKILL AND METHOD. THE
  • Table5: HYPER-PARAMETERS USED FOR DOMAIN ADAPTATION WITH
Download tables as Excel
Related work
  • The development of controllers for legged locomotion has been an enduring subject of interest in robotics, with a large body of work proposing a variety of control strategies for legged systems [37, 49, 54, 20, 18, 64, 8, 3]. However, many of these methods require in-depth knowledge and manual engineering for each behavior, and as such, the resulting capabilities are ultimately limited by the designer’s understanding of how to model and represent agile and dynamic behaviors. Trajectory optimization and model predictive control can mitigate some of the manual effort involved in the design process, but due to the high-dimensional and complex dynamics of legged systems, reduced-order models are often needed to formulate tractable optimization problems [11, 17, 12, 2]. These simplified abstractions tend to be task-specific, and again require significant insight of the properties of each skill.

    Motion imitation. Imitating reference motions provides a general approach for robots to perform a rich variety of behaviors that would otherwise be difficult to manually encode into controllers [48, 21, 55, 63]. But applications of motion imitation to legged robots have predominantly been limited to behaviors that emphasize upper-body motions, with fairly static lower-body movements, where balance control can be delegated to separate control strategies [39, 27, 30]. In contrast to physical robots, substantially more dynamic skills can be reproduced by agents in simulation [38, 33, 9, 35]. Recently, motion imitation with reinforcement learning has been effective for learning a large repertoire of highly acrobatic skills in simulation [44, 34, 45, 32]. But due to the high sample complexity of RL algorithms and other physical limitations, many of the capabilities demonstrated in simulation have yet to be replicated in the real world.
Reference
  • Alexander A. Alemi, Ian Fischer, Joshua V. Dillon, and Kevin Murphy. Deep variational information bottleneck. CoRR, abs/1612.00410, 2016. URL http://arxiv.org/abs/1612.00410.
    Findings
  • Taylor Apgar, Patrick Clary, Kevin Green, Alan Fern, and Jonathan Hurst. Fast online trajectory optimization for the bipedal robot cassie. 06 2018. doi: 10.15607/RSS. 2018.XIV.054.
    Findings
  • Gerardo Bledt, Matthew J. Powell, Benjamin Katz, Jared Di Carlo, Patrick M. Wensing, and Sangbae Kim. Mit cheetah 3: Design and control of a robust, dynamic quadruped robot. 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2245–2252, 2018.
    Google ScholarLocate open access versionFindings
  • Stephen Butterworth et al. On the theory of filter amplifiers. Wireless Engineer, 7(6):536–541, 1930.
    Google ScholarLocate open access versionFindings
  • Yevgen Chebotar, Ankur Handa, Viktor Makoviychuk, Miles Macklin, Jan Issac, Nathan D. Ratliff, and Dieter Fox. Closing the sim-to-real loop: Adapting simulation randomization with real world experience. CoRR, abs/1810.05687, 2018. URL http://arxiv.org/abs/1810.05687.
    Findings
  • Ignasi Clavera, Anusha Nagabandi, Simin Liu, Ronald S. Fearing, Pieter Abbeel, Sergey Levine, and Chelsea Finn. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=HyztsoC5Y7.
    Locate open access versionFindings
  • Stelian Coros, Philippe Beaudoin, and Michiel van de Panne. Robust task-based control policies for physicsbased characters. ACM Trans. Graph. (Proc. SIGGRAPH Asia), 28(5):Article 170, 2009.
    Google ScholarLocate open access versionFindings
  • Stelian Coros, Philippe Beaudoin, and Michiel van de Panne. Generalized biped walking control. ACM Transctions on Graphics, 29(4):Article 130, 2010.
    Google ScholarLocate open access versionFindings
  • Stelian Coros, Andrej Karpathy, Ben Jones, Lionel Reveret, and Michiel van de Panne. Locomotion skills for simulated quadrupeds. ACM Transactions on Graphics, 30(4):Article TBD, 2011.
    Google ScholarLocate open access versionFindings
  • Erwin Coumans and Yunfei Bai. Pybullet, a python module for physics simulation for games, robotics and machine learning. http://pybullet.org, 2016–2019.
    Findings
  • Martin de Lasa, Igor Mordatch, and Aaron Hertzmann. Feature-Based Locomotion Controllers. ACM Transactions on Graphics, 29(3), 2010.
    Google ScholarLocate open access versionFindings
  • Jared Di Carlo, Patrick M Wensing, Benjamin Katz, Gerardo Bledt, and Sangbae Kim. Dynamic locomotion in the MIT cheetah 3 through convex model-predictive control. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1–9. IEEE, 2018.
    Google ScholarLocate open access versionFindings
  • Yan Duan, John Schulman, Xi Chen, Peter L. Bartlett, Ilya Sutskever, and Pieter Abbeel. Rlˆ2: Fast reinforcement learning via slow reinforcement learning. CoRR, abs/1611.02779, 2016. URL http://arxiv.org/abs/1611.02779.
    Findings
  • Gen Endo, Jun Morimoto, Takamitsu Matsubara, Jun Nakanishi, and Gordon Cheng. Learning cpg sensory feedback with policy gradient for biped locomotion for a full-body humanoid. In Proceedings of the 20th National Conference on Artificial Intelligence - Volume 3, AAAI05, page 12671273. AAAI Press, 2005. ISBN 157735236x.
    Google ScholarLocate open access versionFindings
  • Roy Featherstone. Rigid Body Dynamics Algorithms. Springer-Verlag, Berlin, Heidelberg, 2007. ISBN 0387743146.
    Google ScholarFindings
  • Chelsea Finn, Pieter Abbeel, and Sergey Levine. Modelagnostic meta-learning for fast adaptation of deep networks. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 1126–1135, International Convention Centre, Sydney, Australia, 06–11 Aug 2017. PMLR. URL http://proceedings.mlr.press/v70/finn17a.html.
    Locate open access versionFindings
  • Christian Gehring, Stelian Coros, Marco Hutler, Dario Bellicoso, Huub Heijnen, Remo Diethelm, Michael Bloesch, Pter Fankhauser, Jemin Hwangbo, Mark Hoepflinger, and Roland Siegwart. Practice makes perfect: An optimization-based approach to controlling agile motions for a quadruped robot. IEEE Robotics & Automation Magazine, pages 1–1, 02 2016. doi: 10.1109/MRA.2015.2505910.
    Locate open access versionFindings
  • Hartmut Geyer, Andre Seyfarth, and Reinhard Blickhan. Positive force feedback in bouncing gaits? Proceedings. Biological sciences / The Royal Society, 270:2173–83, 11 2003. doi: 10.1098/rspb.2003.2454.
    Locate open access versionFindings
  • Michael Gleicher. Retargetting motion to new characters. In Proceedings of the 25th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH ’98, pages 33–42, New York, NY, USA, 1998. ACM. ISBN 0-89791-999-8. doi: 10.1145/280814.280820. URL http://doi.acm.org/10.1145/280814.280820.
    Locate open access versionFindings
  • A. Goswami. Foot rotation indicator (fri) point: a new gait planning tool to evaluate postural stability of biped robots. In Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C), volume 1, pages 47–52 vol.1, May 1999. doi: 10.1109/ROBOT.1999.769929.
    Locate open access versionFindings
  • David B. Grimes, Rawichote Chalodhorn, and Rajesh P. N. Rao. Dynamic imitation in a humanoid robot through nonparametric probabilistic inference. In Gaurav S. Sukhatme, Stefan Schaal, Wolfram Burgard, and Dieter Fox, editors, Robotics: Science and Systems. The MIT Press, 2006. ISBN 0-262-69348-8. URL http://dblp.uni-trier.de/db/conf/rss/rss2006.html#GrimesCR06.
    Locate open access versionFindings
  • Tuomas Haarnoja, Aurick Zhou, Sehoon Ha, Jie Tan, George Tucker, and Sergey Levine. Learning to walk via deep reinforcement learning. CoRR, abs/1812.11103, 2018. URL http://arxiv.org/abs/1812.11103.
    Findings
  • Josiah Hanna and Peter Stone. Grounded action transformation for robot learning in simulation. In Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI), February 2017.
    Google ScholarLocate open access versionFindings
  • Zhanpeng He, Ryan Julian, Eric Heiden, Hejia Zhang, Stefan Schaal, Joseph J. Lim, Gaurav S. Sukhatme, and Karol Hausman. Zero-shot skill composition and simulation-to-real transfer by learning task representations. CoRR, abs/1810.02422, 2018. URL http://arxiv.org/abs/1810.02422.
    Findings
  • Nicolas Heess, Dhruva TB, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, S. M. Ali Eslami, Martin A. Riedmiller, and David Silver. Emergence of locomotion behaviours in rich environments. CoRR, abs/1707.02286, 2017. URL http://arxiv.org/abs/1707.02286.
    Findings
  • Jemin Hwangbo, Joonho Lee, Alexey Dosovitskiy, Dario Bellicoso, Vassilios Tsounis, Vladlen Koltun, and Marco Hutter. Learning agile and dynamic motor skills for legged robots. Science Robotics, 4(26), 2019. doi: 10.1126/scirobotics.aau5872. URL https://robotics.sciencemag.org/content/4/26/eaau5872.
    Locate open access versionFindings
  • S. Kim, C. Kim, B. You, and S. Oh. Stable whole-body motion generation for humanoid robots to imitate human motions. In 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 2518–2524, Oct 2009. doi: 10.1109/IROS.2009.5354271.
    Locate open access versionFindings
  • Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
    Findings
  • Diederik P. Kingma and Max Welling. Autoencoding variational bayes. CoRR, abs/1312.6114, 2013. URL http://dblp.uni-trier.de/db/journals/corr/corr1312.html#KingmaW13.
    Findings
  • Jonas Koenemann, Felix Burget, and Maren Bennewitz. Real-time imitation of human whole-body motions by humanoids. 2014 IEEE International Conference on Robotics and Automation (ICRA), pages 2806–2812, 2014.
    Google ScholarLocate open access versionFindings
  • Nate Kohl and Peter Stone. Policy gradient reinforcement learning for fast quadrupedal locomotion. In ICRA, pages 2619–2624. IEEE, 2004. URL http://dblp.uni-trier.de/db/conf/icra/icra2004-3.html#KohlS04.
    Locate open access versionFindings
  • Seunghwan Lee, Moonseok Park, Kyoungmin Lee, and Jehee Lee. Scalable muscle-actuated human simulation and control. ACM Trans. Graph., 38(4), July 2019. ISSN 0730-0301. doi: 10.1145/3306346.3322972. URL https://doi.org/10.1145/3306346.3322972.
    Locate open access versionFindings
  • Yoonsang Lee, Sungeun Kim, and Jehee Lee. Data-driven biped control. ACM Trans. Graph., 29(4), July 2010. ISSN 0730-0301. doi: 10.1145/1778765.1781155. URL https://doi.org/10.1145/1778765.1781155.
    Locate open access versionFindings
  • Libin Liu and Jessica Hodgins. Learning basketball dribbling skills using trajectory optimization and deep reinforcement learning. ACM Trans. Graph., 37(4), July 2018. ISSN 0730-0301. doi: 10.1145/3197517.3201315. URL https://doi.org/10.1145/3197517.3201315.
    Locate open access versionFindings
  • Libin Liu, Michiel van de Panne, and KangKang Yin. Guided learning of control graphs for physics-based characters. ACM Transactions on Graphics, 35(3), 2016.
    Google ScholarLocate open access versionFindings
  • Kendall Lowrey, Svetoslav Kolev, Jeremy Dao, Aravind Rajeswaran, and Emanuel Todorov. Reinforcement learning for non-prehensile manipulation: Transfer from simulation to physical system. CoRR, abs/1803.10371, 2018.
    Findings
  • URL http://arxiv.org/abs/1803.10371.
    Findings
  • [37] Hirofumi Miura and Isao Shimoyama. Dynamic walk of a biped. The International Journal of Robotics Research, 3:60 – 74, 1984.
    Google ScholarLocate open access versionFindings
  • [38] Uldarico Muico, Yongjoon Lee, Jovan Popoviundefined, and Zoran Popoviundefined. Contact-aware nonlinear control of dynamic characters. ACM Trans. Graph., 28(3), July 2009. ISSN 0730-0301. doi: 10.1145/1531326.1531387. URL https://doi.org/10.1145/
    Locate open access versionFindings
  • [39] S. Nakaoka, A. Nakazawa, K. Yokoi, H. Hirukawa, and K. Ikeuchi. Generating whole body motions for a biped humanoid robot from captured human dances. In 2003
    Google ScholarLocate open access versionFindings
  • IEEE International Conference on Robotics and Automation (Cat. No.03CH37422), volume 3, pages 3905–3910 vol.3, Sep. 2003. doi: 10.1109/ROBOT.2003.1242196.
    Locate open access versionFindings
  • [40] Gerhard Neumann and Jan R. Peters. Fitted qiteration by advantage weighted regression. In Systems 21, pages 1177–1184. Curran Associates, Inc., 2009.
    Google ScholarLocate open access versionFindings
  • URL http://papers.nips.cc/paper/
    Findings
  • [41] OpenAI, Marcin Andrychowicz, Bowen Baker, Maciek Chociej, Rafal Jozefowicz, Bob McGrew, Jakub W. Pachocki, Jakub Pachocki, Arthur Petron, Matthias Plappert, Glenn Powell, Alex Ray, Jonas Schneider, Szymon Sidor, Josh Tobin, Peter Welinder, Lilian Weng, and Wojciech Zaremba. Learning dexterous in-hand manipulation. CoRR, abs/1808.00177, 2018. URL
    Findings
  • http://arxiv.org/abs/1808.00177.
    Findings
  • [42] X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel. Sim-to-real transfer of robotic control with dynamics randomization. In 2018 IEEE International
    Google ScholarLocate open access versionFindings
  • 1–8, May 2018. doi: 10.1109/ICRA.2018.8460528.
    Findings
  • [43] Xue Bin Peng, Glen Berseth, and Michiel van de Panne. Terrain-adaptive locomotion skills using deep reinforcement learning. ACM Trans. Graph., 35(4):81:1–81:12, July 2016. ISSN 0730-0301. doi: 10.1145/2897824.
    Locate open access versionFindings
  • 2925881. URL http://doi.acm.org/10.1145/2897824.
    Findings
  • [44] Xue Bin Peng, Pieter Abbeel, Sergey Levine, and Michiel van de Panne. Deepmimic: Example-guided deep reinforcement learning of physics-based character skills. ACM Trans. Graph., 37(4):143:1–143:14, July 2018.
    Google ScholarLocate open access versionFindings
  • ISSN 0730-0301. doi: 10.1145/3197517.3201311. URL
    Findings
  • http://doi.acm.org/10.1145/3197517.3201311.
    Findings
  • [45] Xue Bin Peng, Angjoo Kanazawa, Jitendra Malik, Pieter Abbeel, and Sergey Levine. Sfv: Reinforcement learning of physical skills from videos. ACM Trans. Graph., 37 (6), November 2018.
    Google ScholarLocate open access versionFindings
  • [46] Xue Bin Peng, Aviral Kumar, Grace Zhang, and Sergey Levine. Advantage-weighted regression: Simple and scalable off-policy reinforcement learning. CoRR, abs/1910.00177, 2019. URL https://arxiv.org/abs/1910.00177.
    Findings
  • [47] Lerrel Pinto, James Davidson, Rahul Sukthankar, and Abhinav Gupta. Robust adversarial reinforcement learning. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 2817–2826, International Convention Centre, Sydney, Australia, 06–11 Aug 2017. PMLR. URL http://proceedings.mlr.press/v70/pinto17a.html.
    Locate open access versionFindings
  • [48] Nancy Pollard, Jessica K. Hodgins, M.J. Riley, and Chris Atkeson. Adapting human motion for the control of a humanoid robot. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA ’02), May 2002.
    Google ScholarLocate open access versionFindings
  • [49] M. H. Raibert. Hopping in legged systems modeling and simulation for the two-dimensional one-legged case. IEEE Transactions on Systems, Man, and Cybernetics, SMC-14(3):451–463, May 1984. ISSN 2168-2909. doi: 10.1109/TSMC.1984.6313238.
    Locate open access versionFindings
  • [50] Marc H Raibert. Trotting, pacing and bounding by a quadruped robot. Journal of biomechanics, 23:79–98, 1990.
    Google ScholarLocate open access versionFindings
  • [51] Andrei A. Rusu, Matej Veerk, Thomas Rothrl, Nicolas Heess, Razvan Pascanu, and Raia Hadsell. Sim-to-real robot learning from pixels with progressive nets. In Sergey Levine, Vincent Vanhoucke, and Ken Goldberg, editors, Proceedings of the 1st Annual Conference on Robot Learning, volume 78 of Proceedings of Machine Learning Research, pages 262–270. PMLR, 13–15 Nov 2017. URL http://proceedings.mlr.press/v78/rusu17a.html.
    Locate open access versionFindings
  • [52] Fereshteh Sadeghi and Sergey Levine. Cad2rl: Real single-image flight without a single real image. CoRR, abs/1611.04201, 2016. URL http://arxiv.org/abs/1611.04201.
    Findings
  • [53] John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. CoRR, abs/1707.06347, 2017. URL http://arxiv.org/abs/1707.06347.
    Findings
  • [54] William J. Schwind and Daniel E. Koditschek. Spring loaded inverted pendulum running: a plant model. 1998.
    Google ScholarFindings
  • [55] W. Suleiman, E. Yoshida, F. Kanehiro, J. Laumond, and A. Monin. On human motion imitation by humanoid robot. In 2008 IEEE International Conference on Robotics and Automation, pages 2697–2704, May 2008. doi: 10.1109/ROBOT.2008.4543619.
    Locate open access versionFindings
  • [56] Richard S. Sutton and Andrew G. Barto. Introduction to Reinforcement Learning. MIT Press, Cambridge, MA, USA, 1st edition, 1998. ISBN 0262193981.
    Google ScholarFindings
  • [57] J. Tan, Z. Xie, B. Boots, and C. K. Liu. Simulation-based design of dynamic controllers for humanoid balancing. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2729–2736, Oct 2016. doi: 10.1109/IROS.2016.7759424.
    Locate open access versionFindings
  • [58] Jie Tan, Tingnan Zhang, Erwin Coumans, Atil Iscen, Yunfei Bai, Danijar Hafner, Steven Bohez, and Vincent Vanhoucke. Sim-to-real: Learning agile locomotion for quadruped robots. In Proceedings of Robotics: Science and Systems, Pittsburgh, Pennsylvania, June 2018. doi: 10.15607/RSS.2018.XIV.010.
    Locate open access versionFindings
  • [59] Russ Tedrake, Teresa Weirui Zhang, and H. Sebastian Seung. Stochastic policy gradient reinforcement learning on a simple 3d biped. In Proceedings of the 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2004), volume 3, pages 2849–2854, Piscataway, NJ, USA, 2004. IEEE. ISBN 0-7803-8463-6. URL http://www.cs.cmu.edu/∼cga/legs/01389841.pdf.
    Locate open access versionFindings
  • [60] Joshua Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba, and Pieter Abbeel. Domain randomization for transferring deep neural networks from simulation to the real world. CoRR, abs/1703.06907, 2017. URL http://arxiv.org/abs/1703.06907.
    Findings
  • [61] Xingxing Wang. Laikago Pro, Unitree Robotics, 2018. URL http://www.unitree.cc/e/action/ShowInfo.php?classid=6&id=355.
    Findings
  • [62] Zhaoming Xie, Patrick Clary, Jeremy Dao, Pedro Morais, Jonathan Hurst, and Michiel van de Panne. Learning locomotion skills for cassie: Iterative design and sim-toreal. In Proc. Conference on Robot Learning (CORL 2019), 2019.
    Google ScholarLocate open access versionFindings
  • [63] K. Yamane, S. O. Anderson, and J. K. Hodgins. Controlling humanoid robots with human motion data: Experimental validation. In 2010 10th IEEE-RAS International Conference on Humanoid Robots, pages 504–510, Dec 2010. doi: 10.1109/ICHR.2010.5686312.
    Locate open access versionFindings
  • [64] KangKang Yin, Kevin Loken, and Michiel van de Panne. Simbicon: Simple biped locomotion control. ACM Trans. Graph., 26(3):Article 105, 2007.
    Google ScholarLocate open access versionFindings
  • [65] Wenhao Yu, Visak C. V. Kumar, Greg Turk, and C. Karen Liu. Sim-to-real transfer for biped locomotion. CoRR, abs/1903.01390, 2019. URL http://arxiv.org/abs/1903.01390.
    Findings
  • [66] Wenhao Yu, C. Karen Liu, and Greg Turk. Policy transfer with strategy optimization. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=H1g6osRcFQ.
    Locate open access versionFindings
  • [67] Wenhao Yu, Jie Tan, Yunfei Bai, Erwin Coumans, and Sehoon Ha. Learning fast adaptation with meta strategy optimization, 2019.
    Google ScholarFindings
  • [68] He Zhang, Sebastian Starke, Taku Komura, and Jun Saito. Mode-adaptive neural networks for quadruped motion control. ACM Trans. Graph., 37(4):145:1–145:11, July 2018. ISSN 0730-0301. doi: 10.1145/3197517. 3201366. URL http://doi.acm.org/10.1145/3197517.3201366.
    Locate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments