Off-policy reinforcement learning with Gaussian processes

Automatica Sinica, IEEE/CAA Journal of  , Volume 1, Issue 3, 2014, Pages 227-238.

Cited by: 21|Bibtex|Views5|DOI:https://doi.org/10.1109/JAS.2014.7004680
EI WOS
Other Links: academic.microsoft.com

Abstract:

An off-policy Bayesian nonparameteric approximate reinforcement learning framework, termed as GPQ, that employs a Gaussian processes (GP) model of the value (Q) function is presented in both the batch and online settings. Sufficient conditions on GP hyperparameter selection are established to guarantee convergence of off-policy GPQ in the...More

Code:

Data:

Your rating :
0

 

Tags
Comments