Adaptive Estimator Selection for Off-Policy Evaluation
ICML, pp. 9196-9205, 2020.
We consider episodic reinforcement learning where the agent interacts with the environment in episodes of length H
We develop a generic data-driven method for estimator selection in off-policy policy evaluation settings. We establish a strong performance guarantee for the method, showing that it is competitive with the oracle estimator, up to a constant factor. Via in-depth case studies in contextual bandits and reinforcement learning, we demonstrat...More
PPT (Upload PPT)