UCB and InfoGain Exploration via $\boldsymbol{Q}$-Ensembles
arXiv: Learning, Volume abs/1706.01502, 2017.
EI
Abstract:
We show how an ensemble of $Q^*$-functions can be leveraged for more effective exploration in deep reinforcement learning. We build on well established algorithms from the bandit setting, and adapt them to the $Q$-learning setting. First we propose an exploration strategy based on upper-confidence bounds (UCB). Next, we define an u0027u00...More
Code:
Data:
Tags
Comments