DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019.

Cited by: 0|Bibtex|Views13

Abstract:

In many real-world reinforcement learning applications, access to the environment is limited to a fixed dataset, instead of direct (online) interaction with the environment. When using this data for either evaluation or training of a new policy, accurate estimates of discounted stationary distribution ratios- correction terms which quanti...More

Code:

Data:

Your rating :
0

 

Tags
Comments