DualDICE - Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections

NeurIPS, pp. 2315-2325, 2019.

Cited by: 38|Bibtex|Views105
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com|arxiv.org

Abstract:

In many real-world reinforcement learning applications, access to the environment is limited to a fixed dataset, instead of direct (online) interaction with the environment. When using this data for either evaluation or training of a new policy, accurate estimates of discounted stationary distribution ratios -- correction terms which qu...More

Code:

Data:

Full Text
Your rating :
0

 

Tags
Comments