Loss is its own Reward: Self-Supervision for Reinforcement Learning
ICLR, Volume abs/1612.07307, 2017.
Reinforcement learning optimizes policies for expected cumulative reward. Need the supervision be so narrow? Reward is delayed and sparse for many tasks, making it a difficult and impoverished signal for end-to-end optimization. To augment reward, we consider a range of self-supervised tasks that incorporate states, actions, and successor...More
Full Text (Upload PDF)
PPT (Upload PPT)