Corruption Robust Exploration in Episodic Reinforcement Learning
Abstract:
We initiate the study of multi-stage episodic reinforcement learning under adversarial manipulations in both the rewards and the transition probabilities of the underlying system. Existing efficient algorithms heavily rely on the "optimism under uncertainty" principle which dictates their behavior and does not allow flexibility to perfo...More
Code:
Data:
Full Text
Tags
Comments