Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog

Cited by: 19|Bibtex|Views25
Other Links: arxiv.org

Abstract:

Most deep reinforcement learning (RL) systems are not able to learn effectively from off-policy data, especially if they cannot explore online in the environment. These are critical shortcomings for applying RL to real-world problems where collecting data is expensive, and models must be tested offline before being deployed to interact ...More

Code:

Data:

Full Text
Your rating :
0

 

Tags
Comments