Reinforcement Learning with Dynamic Boltzmann Softmax Updates
IJCAI 2020, 2020.
We propose the dynamic Boltzmann softmax operator in value function estimation with a time-varying, state-independent parameter
Value function estimation is an important task in reinforcement learning, i.e., prediction. The commonly used operator for prediction in Q-learning is the hard max operator, which always commits to the maximum action-value according to current estimation. Such `hardu0027 updating scheme results in pure exploitation and may lead to misbeha...More
PPT (Upload PPT)