Mistake bounds on the noise-free multi-armed bandit game

We study the {0,1}-loss version of adaptive adversarial multi-armed bandit problems with α(≥1) lossless arms. For the problem, we show a tight bound K−α−Θ(1/T) on the minimax expected number of mistakes (1-losses), where K is the number of arms and T is the number of rounds.



