High-order local dynamic programming.

ADPRL(2011)

引用 9|浏览19
暂无评分
摘要
We describe a new local dynamic-programming algorithm for solving stochastic continuous Optimal-Control problems. We use a collocation set of cubature vectors to both propagate the state distribution and perform the Bell- man backup. The algorithm can approximate the local policy and cost-to-go with arbitrary function bases. We compare a quadratic-cost-to-go/linear-feedback controller with a cubic- cost-to-go/quadratic-policy controller using a 10-dimensional simulated swimming robot, and find that the higher order approximation yields a more general policy with a larger basin of attraction. I. INTRODUCTION Optimal Control describes the choice of actions which minimizes future costs. While plan-execute approaches to robotic control require a-priori trajectory design, Optimal Control holds the promise of a single, first-principle approach to controller synthesis. The strongest motivation for consid- ering Optimal Control as a framework in robotics is provided by the perceived near-optimality of biological behaviour, along with other attributes that are observed in nature and would be expected of an optimal controller (1). Optimal Control problems in their general continuous- state, nonlinear form, are notoriously difficult to solve, even approximately. The discrete-state case is addressable by both theoretical and numerical means, but the discretization required to apply such methods to continuous problems doesn't scale as the number of states grows exponentially with the dimension. In the continuous nonlinear case, local methods are the only class of algorithms which successfully solve general, high-dimensional Optimal Control problems. These methods are based on the observation that optimal solutions form extremal trajectories, i.e. are solutions to a calculus-of- variations problem. Problems which are sufficiently smooth and deterministic, e.g. space flight, have been successfully solved using methods which are characterize the solution on a single optimal trajectory. When the dynamics are stochastic, an open-loop controller cannot suffice, and feedback terms must be incorporated. Second-order local dynamic programming algorithms like DDP (2) and iterative-LQG (3) compute a quadratic approxi- mation of the cost-to-go around a trajectory and correspond- ingly, a local linear-feedback controller. While controllers constructed with these algorithms are indeed more robust, they do not take the noise explicitly into account. Instead,
更多
查看译文
关键词
second order,noise,trajectory,dynamic programming algorithm,dynamic programming,optimal control,higher order,linear systems,calculus of variation,robot control,design optimization,mobile robots,mathematical model,space flight,feedback,continuous optimization,first principle
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要