Reinforcement Learning with Dynamic Boltzmann Softmax Updates
IJCAI 2020, 2020.
EI
Weibo:
Abstract:
Value function estimation is an important task in reinforcement learning, i.e., prediction. The commonly used operator for prediction in Q-learning is the hard max operator, which always commits to the maximum action-value according to current estimation. Such `hardu0027 updating scheme results in pure exploitation and may lead to misbeha...More
Tags
Comments