DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019.
In many real-world reinforcement learning applications, access to the environment is limited to a fixed dataset, instead of direct (online) interaction with the environment. When using this data for either evaluation or training of a new policy, accurate estimates of discounted stationary distribution ratios- correction terms which quanti...More
Full Text (Upload PDF)
PPT (Upload PPT)