Positive-Unlabeled Reward Learning

Denil Misha
Denil Misha
Cited by: 0|Bibtex|Views20
Other Links: arxiv.org

Abstract:

Learning reward functions from data is a promising path towards achieving scalable Reinforcement Learning (RL) for robotics. However, a major challenge in training agents from learned reward models is that the agent can learn to exploit errors in the reward model to achieve high reward behaviors that do not correspond to the intended ta...More

Code:

Data:

Full Text
Your rating :
0

 

Tags
Comments