Performance Bounds for Lambda Policy Iteration
Clinical Orthopaedics and Related Research(2007)
摘要
Abstract: We consider the discrete-time infinite-horizon discounted stationary optimal control problem formalized by Markov Decision Processes. We study Lambda Policy Iteration, a family of algorithms param- eterized by lambda, originally introduced by Ioffe and Bertsekas. Lambda Policy Iteration generalizes the standard algorithms Value Iteration and Policy Iteration, and is closely related to TD(lambda) introduced by Sutton and Barto. We deepen the original theory developped by Ioffe and Bertsekas by providing convergence rate bounds which generalize standard bounds for Value Iteration described for instance by Puterman. We also develop the theory of this algorithm when it is used in an approximate form. Doing so, we extend and unify the separate analyses developped by Munos for Approximate Value Iteration and Approximate Policy Iteration. The main contribution of this paper is that we show that doing Approximate Lambda Policy Iteration is sound. Key-words: Optimal Control, Reinforcement Learning, Markov Decision Processes, Analysis of Algo-
更多查看译文
关键词
discrete time,reinforcement learning,optimal control,convergence rate,value iteration,markov decision process
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络