Best Action Selection In A Stochastic Environment
AAMAS '16: Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems(2016)
摘要
We study the problem of selecting the best action from multiple candidates in a stochastic environment. In such a stochastic setting, when taking an action, a player receives a random reward and affords a random cost, which are drawn from two unknown distributions. We target at selecting the best action, the one with the maximum ratio of the expected reward to the expected cost, after exploring the actions for n rounds. In particular, we study three mechanisms: (i) the uniform exploration mechanism MU; (ii) the successive elimination mechanism MSE; and (iii) the ratio confidence bound exploration mechanism MRCB. We prove that for all the three mechanisms, the probabilities that the best action is not selected (i.e., the error probabilities) can be upper bounded by O (exp f), where c is a constant related to the mechanisms and coe ffi cients about the actions. We then give an asymptotic lower bound of the error probabilities of the consistent mechanisms for Bernoulli setting, and discuss its relationship with the upper bounds in di ff erent aspects. Our proposed mechanisms can be degenerated to cover the cases where only the reward/ costs are random. We also test the proposed mechanisms through numerical experiments.
更多查看译文
关键词
Design,Economics,Bandit Algorithm,Stochastic
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络