Recharging Bandits.

FOCS（2018）

引用 46|浏览46

暂无评分

摘要

We introduce a general model of bandit problems in which the expected payout of an arm is an increasing concave function of the time since it was last played. We first develop a PTAS for the underlying optimization problem of determining a reward-maximizing sequence of arm pulls. We then show how to use this PTAS in a learning setting to obtain sublinear regret.

查看译文

关键词

multi armed bandit problems,learning theory,scheduling,approximation algorithms

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要