Policy Iteration with Gaussian Process based Value Function Approximation

semanticscholar(2020)

引用 0|浏览0
暂无评分
摘要
In this work, we explore the use of Gaussian processes (GP) as function approximators for Reinforcement Learning (RL), and build estimates of the value function and Q-function using GPs. Such a representation allows us to learn Q-functions, and thereby policies, conditioned on uncertainty in the system dynamics, and can be useful in sample efficiently transferring policies learned in simulation to hardware. We use two approaches GPTD and GPSARSA, from Engel et al. [1] to build approximate value functions and Q-functions, respectively. While for simple, continuous problems, we found these to be effective at approximating the value function and the Qfunction, for discontinuous landscapes GPSARSA deteriorates in performance, even on simple problems. As the problem complexity increases, for example, for an inverted pendulum, we find that both approaches are extremely sensitive to the GP hyperparameters, and do not scale well. We experiment with a sparse variant of the algorithm, but find that GPSARSA still converges to poor solutions. Our experiments show that while GPTD and GPSARSA are nice theoretical formulations, they are not suitable for complex domains without extensive hyperparameter tuning.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要