Augment-Reinforce-Merge Policy Gradient for Binary Stochastic Policy
arXiv: Learning, 2019.
Due to the high variance of policy gradients, on-policy optimization algorithms are plagued with low sample efficiency. In this work, we propose Augment-Reinforce-Merge (ARM) policy gradient estimator as an unbiased low-variance alternative to previous baseline estimators on tasks with binary action space, inspired by the recent ARM gradi...More
PPT (Upload PPT)