Optimal and Greedy Algorithms for Multi-Armed Bandits with Many Arms

Cited by: 1|Views10

Abstract:

We characterize Bayesian regret in a stochastic multi-armed bandit problem with a large but finite number of arms. In particular, we assume the number of arms $k$ is $T^{\alpha}$, where $T$ is the time-horizon and $\alpha$ is in $(0,1)$. We consider a Bayesian setting where the reward distribution of each arm is drawn independently from...More

Code:

Data:

Full Text
Bibtex
Your rating :
0

 

Tags
Comments