A Reduction from Reinforcement Learning to No-Regret Online Learning
AISTATS, pp. 3514-3524, 2019.
We present a reduction from reinforcement learning (RL) to no-regret online learning based on the saddle-point formulation of RL, by which "any" online algorithm with sublinear regret can generate policies with provable performance guarantees. This new perspective decouples the RL problem into two parts: regret minimization and function...More
PPT (Upload PPT)