Loss is its own Reward: Self-Supervision for Reinforcement Learning

Parsa Mahmoudieh
Parsa Mahmoudieh
Max Argus
Max Argus

ICLR, Volume abs/1612.07307, 2017.

Cited by: 8|Bibtex|Views146
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com|arxiv.org

Abstract:

Reinforcement learning optimizes policies for expected cumulative reward. Need the supervision be so narrow? Reward is delayed and sparse for many tasks, making it a difficult and impoverished signal for end-to-end optimization. To augment reward, we consider a range of self-supervised tasks that incorporate states, actions, and successor...More

Code:

Data:

Your rating :
0

 

Tags
Comments