Reinforcement Learning with Dynamic Boltzmann Softmax Updates

IJCAI 2020, 2020.

Cited by: 0|Views138
EI
Weibo:
We propose the dynamic Boltzmann softmax operator in value function estimation with a time-varying, state-independent parameter

Abstract:

Value function estimation is an important task in reinforcement learning, i.e., prediction. The commonly used operator for prediction in Q-learning is the hard max operator, which always commits to the maximum action-value according to current estimation. Such `hardu0027 updating scheme results in pure exploitation and may lead to misbeha...More
0
Your rating :
0

 

Tags
Comments