DecAP: Decaying Action Priors for Accelerated Learning of Torque-Based Legged Locomotion Policies
arxiv(2023)
摘要
Optimal Control for legged robots has gone through a paradigm shift from
position-based to torque-based control, owing to the latter's compliant and
robust nature. In parallel to this shift, the community has also turned to Deep
Reinforcement Learning (DRL) as a promising approach to directly learn
locomotion policies for complex real-life tasks. However, most end-to-end DRL
approaches still operate in position space, mainly because learning in torque
space is often sample-inefficient and does not consistently converge to natural
gaits. To address these challenges, we propose a two-stage framework. In the
first stage, we generate our own imitation data by training a position-based
policy, eliminating the need for expert knowledge to design optimal
controllers. The second stage incorporates decaying action priors, a novel
method to enhance the exploration of torque-based policies aided by imitation
rewards. We show that our approach consistently outperforms imitation learning
alone and is robust to scaling these rewards from 0.1x to 10x. We further
validate the benefits of torque control by comparing the robustness of a
position-based policy to a position-assisted torque-based policy on a quadruped
(Unitree Go1) without any domain randomization in the form of external
disturbances during training.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要