Trajectory-wise Control Variates for Variance Reduction in Policy Gradient Methods

CoRL, pp. 1379-1394, 2019.

Cited by: 0|Bibtex|Views18
EI
Other Links: arxiv.org|dblp.uni-trier.de

Abstract:

Policy gradient methods have demonstrated success in reinforcement learning tasks that have high-dimensional continuous state and action spaces. However, policy gradient methods are also notoriously sample inefficient. This can be attributed, at least in part, to the high variance in estimating the gradient of the task objective with Mo...More

Code:

Data:

Full Text
Your rating :
0

 

Tags
Comments