Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning

2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA)(2017)

引用 1140|浏览1299
暂无评分
摘要
Model-free deep reinforcement learning algorithms have been shown to be capable of learning a wide range of robotic skills, but typically require a very large number of samples to achieve good performance. Model-based algorithms, in principle, can provide for much more efficient learning, but have proven difficult to extend to expressive, high-capacity models such as deep neural networks. In this work, we demonstrate that medium-sized neural network models can in fact be combined with model predictive control (MPC) to achieve excellent sample complexity in a model-based reinforcement learning algorithm, producing stable and plausible gaits to accomplish various complex locomotion tasks. We also propose using deep neural network dynamics models to initialize a model-free learner, in order to combine the sample efficiency of model-based approaches with the high task-specific performance of model-free methods. We empirically demonstrate on MuJoCo locomotion tasks that our pure model-based approach trained on just random action data can follow arbitrary trajectories with excellent sample efficiency, and that our hybrid algorithm can accelerate model-free learning on high-speed benchmark tasks, achieving sample efficiency gains of 3-5x on swimmer, cheetah, hopper, and ant agents. Videos can be found at https://sites.google.com/view/mbmf
更多
查看译文
关键词
model-free learning,model-free fine-tuning,model-free deep reinforcement learning algorithms,model-based algorithms,model predictive control,model-based reinforcement learning algorithm,complex locomotion tasks,deep neural network dynamics models,model-free learner,model-based approaches,model-free methods,sample complexity,model-based deep reinforcement learning,robotic skills,MPC,plausible gaits,stable gaits
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要