Statistical Bootstrapping for Uncertainty Estimation in Off-Policy Evaluation

Cited by: 0|Bibtex|Views15
Other Links: arxiv.org

Abstract:

In reinforcement learning, it is typical to use the empirically observed transitions and rewards to estimate the value of a policy via either model-based or Q-fitting approaches. Although straightforward, these techniques in general yield biased estimates of the true value of the policy. In this work, we investigate the potential for st...More

Code:

Data:

Full Text
Your rating :
0

 

Tags
Comments