Error Propagation for Approximate Policy and Value Iteration (extended version)
Neural Information Processing Systems(2010)
摘要
We address the question of how the approximation error/Bellman residual at each
iteration of the Approximate Policy/Value Iteration algorithms influences the quality
of the resulted policy. We quantify the performance loss as the L^p norm of the
approximation error/Bellman residual at each iteration. Moreover, we show that
the performance loss depends on the expectation of the squared Radon-Nikodym
derivative of a certain distribution rather than its supremum -- as opposed to what
has been suggested by the previous results. Also our results indicate that the
contribution of the approximation/Bellman error to the performance loss is more
prominent in the later iterations of API/AVI, and the effect of an error term in the
earlier iterations decays exponentially fast.
更多查看译文
关键词
error propagation,approximation error,value iteration
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要