Equivalence Between Policy Gradients and Soft Q-Learning

Cited by: 0|Bibtex|Views17
Other Links: arxiv.org

Abstract:

Two of the leading approaches for model-free reinforcement learning are policy gradient methods and $Q$-learning methods. $Q$-learning methods can be effective and sample-efficient when they work, however, it is not well-understood why they work, since empirically, the $Q$-values they estimate are very inaccurate. A partial explanation ...More

Code:

Data:

Your rating :
0

 

Tags
Comments