On the robustness of a one-period look-ahead policy in multi-armed bandit problems

Procedia Computer Science(2010)

引用 20|浏览12
暂无评分
摘要
We analyze the robustness of a knowledge gradient (KG) policy for the multi-armed bandit problem. The KG policy is based on a one-period look-ahead, which is known to underperform in other learning problems when the marginal value of information is non-concave. We present an adjustment that corrects for non-concavity and approximates a multi-step look-ahead, and compare its performance to the unadjusted KG policy and other heuristics. We provide guidance for determining when adjustment will improve performance, and when it is unnecessary. We present evidence suggesting that KG is generally robust in the multi-armed bandit setting, which argues in favour of KG as an alternative to index policies. (C) 2010 Published by Elsevier Ltd.
更多
查看译文
关键词
multi-armed bandit,knowledge gradient,optimal learning,Bayesian learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要