A Reduction from Reinforcement Learning to No-Regret Online Learning

AISTATS, pp. 3514-3524, 2019.

Cited by: 2|Bibtex|Views56
EI
Other Links: arxiv.org|dblp.uni-trier.de|academic.microsoft.com

Abstract:

We present a reduction from reinforcement learning (RL) to no-regret online learning based on the saddle-point formulation of RL, by which "any" online algorithm with sublinear regret can generate policies with provable performance guarantees. This new perspective decouples the RL problem into two parts: regret minimization and function...More

Code:

Data:

Full Text
Your rating :
0

 

Tags
Comments