Equivalence Between Policy Gradients and Soft Q-Learning

arXiv: Learning, Volume abs/1704.06440, 2017.

Cited by: 125|Bibtex|Views79
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com

Abstract:

Two of the leading approaches for model-free reinforcement learning are policy gradient methods and $Q$-learning methods. $Q$-learning methods can be effective and sample-efficient when they work, however, it is not well-understood why they work, since empirically, the $Q$-values they estimate are very inaccurate. A partial explanation ma...More

Code:

Data:

Your rating :
0

 

Tags
Comments