Reinforcement Learning with Dynamic Boltzmann Softmax Updates

Cited by: 0|Views15

Abstract:

Value function estimation is an important task in reinforcement learning, i.e., prediction. The commonly used operator for prediction in Q-learning is the hard max operator, which always commits to the maximum action-value according to current estimation. Such `hard' updating scheme results in pure exploitation and may lead to misbehavi...More

Code:

Data:

Full Text
Bibtex
Your rating :
0

 

Tags
Comments