AlgaeDICE: Policy Gradient from Arbitrary Experience

Cited by: 9|Views76

Abstract:

In many real-world applications of reinforcement learning (RL), interactions with the environment are limited due to cost or feasibility. This presents a challenge to traditional RL algorithms since the max-return objective involves an expectation over on-policy samples. We introduce a new formulation of max-return optimization that all...More

Code:

Data:

Full Text
Bibtex
Your rating :
0

 

Tags
Comments