The Mirage of Action-Dependent Baselines in Reinforcement Learning

ICML, pp. 5015-5024, 2018.

Cited by: 56|Bibtex|Views82|DOI:https://doi.org/10.17863/CAM.23539
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com|arxiv.org

Abstract:

Policy gradient methods are a widely used class of model-free reinforcement learning algorithms where a state-dependent baseline is used to reduce gradient estimator variance. Several recent papers extend the baseline to depend on both the state and action and suggest that this significantly reduces variance and improves sample efficiency...More

Code:

Data:

Full Text
Your rating :
0

 

Tags
Comments