Adaptive Estimator Selection for Off-Policy Evaluation

Srinath Pavithra
Srinath Pavithra

ICML, pp. 9196-9205, 2020.

Cited by: 0|Views30
EI
Weibo:
We consider episodic reinforcement learning where the agent interacts with the environment in episodes of length H

Abstract:

We develop a generic data-driven method for estimator selection in off-policy policy evaluation settings. We establish a strong performance guarantee for the method, showing that it is competitive with the oracle estimator, up to a constant factor. Via in-depth case studies in contextual bandits and reinforcement learning, we demonstrat...More

Code:

Data:

0
Your rating :
0

 

Tags
Comments