Equivalence Between Policy Gradients and Soft Q-Learning
arXiv: Learning, Volume abs/1704.06440, 2017.
Two of the leading approaches for model-free reinforcement learning are policy gradient methods and $Q$-learning methods. $Q$-learning methods can be effective and sample-efficient when they work, however, it is not well-understood why they work, since empirically, the $Q$-values they estimate are very inaccurate. A partial explanation ma...More
Full Text (Upload PDF)
PPT (Upload PPT)