Batched Multi-Armed Bandits with Optimal Regret

Esfandiari Hossein,Karbasi Amin,Mehrabian Abbas,Mirrokni Vahab

arxiv（2019）

引用 2|浏览54

暂无评分

摘要

We present a simple and efficient algorithm for the batched stochastic multi-armed bandit problem. We prove a bound for its expected regret that improves over the best-known regret bound, for any number of batches. In particular, our algorithm achieves the optimal expected regret by using only a logarithmic number of batches.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要