UCB and InfoGain Exploration via $\boldsymbol{Q}$-Ensembles

arXiv: Learning, Volume abs/1706.01502, 2017.

Cited by: 2|Bibtex|Views36
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com

Abstract:

We show how an ensemble of $Q^*$-functions can be leveraged for more effective exploration in deep reinforcement learning. We build on well established algorithms from the bandit setting, and adapt them to the $Q$-learning setting. First we propose an exploration strategy based on upper-confidence bounds (UCB). Next, we define an u0027u00...More

Code:

Data:

Your rating :
0

 

Tags
Comments