Trajectory-wise Control Variates for Variance Reduction in Policy Gradient Methods
CoRL, pp. 1379-1394, 2019.
Policy gradient methods have demonstrated success in reinforcement learning tasks that have high-dimensional continuous state and action spaces. However, policy gradient methods are also notoriously sample inefficient. This can be attributed, at least in part, to the high variance in estimating the gradient of the task objective with Mo...More
PPT (Upload PPT)