Off-policy reinforcement learning with Gaussian processes
Automatica Sinica, IEEE/CAA Journal of , Volume 1, Issue 3, 2014, Pages 227-238.
An off-policy Bayesian nonparameteric approximate reinforcement learning framework, termed as GPQ, that employs a Gaussian processes (GP) model of the value (Q) function is presented in both the batch and online settings. Sufficient conditions on GP hyperparameter selection are established to guarantee convergence of off-policy GPQ in the...More
Full Text (Upload PDF)
PPT (Upload PPT)