Recharging Bandits.

FOCS(2018)

引用 46|浏览46
暂无评分
摘要
We introduce a general model of bandit problems in which the expected payout of an arm is an increasing concave function of the time since it was last played. We first develop a PTAS for the underlying optimization problem of determining a reward-maximizing sequence of arm pulls. We then show how to use this PTAS in a learning setting to obtain sublinear regret.
更多
查看译文
关键词
multi armed bandit problems,learning theory,scheduling,approximation algorithms
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要