Optimizing Simulations with Noise-Tolerant Structured Exploration

2018 IEEE International Conference on Robotics and Automation (ICRA)(2018)

引用 16|浏览183
暂无评分
摘要
We propose a simple drop-in noise-tolerant replacement for the standard finite difference procedure used ubiquitously in blackbox optimization. In our approach, parameter perturbation directions are defined by a family of structured orthogonal matrices. We show that at the small cost of computing a Fast Walsh-Hadamard/Fourier Transform (FWHT/FFT), such structured finite differences consistently give higher quality approximation of gradients and Jacobians in comparison to vanilla approaches that use coordinate directions or random Gaussian perturbations. We find that trajectory optimizers like Iterative LQR and Differential Dynamic Programming require fewer iterations to solve several classic continuous control tasks when our methods are used to linearize noisy, blackbox dynamics instead of standard finite differences. By embedding structured exploration in a quasi-Newton optimizer (LBFGS), we are able to learn agile walking and turning policies for quadruped locomotion, that successfully transfer from simulation to actual hardware.We theoretically justify our methods via bounds on the quality of gradient reconstruction and provide a basis for applying them also to nonsmooth problems.
更多
查看译文
关键词
trajectory optimizers,noisy dynamics,quasiNewton optimizer,turning policies,noise-tolerant structured exploration,blackbox optimization,parameter perturbation directions,structured orthogonal matrices,structured finite differences,continuous control tasks,agile walking learning,Fast Walsh-Hadamard Fourier Transform,FWHT FFT,drop-in noise-tolerant replacement,quadruped locomotion,deep reinforcement learning,Mujoco simulator,3D renderers
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要