Positive-Unlabeled Reward Learning
Abstract:
Learning reward functions from data is a promising path towards achieving scalable Reinforcement Learning (RL) for robotics. However, a major challenge in training agents from learned reward models is that the agent can learn to exploit errors in the reward model to achieve high reward behaviors that do not correspond to the intended ta...More
Code:
Data:
Full Text
Tags
Comments