Incentivizing exploration

Peter Frazier,David Kempe,Jon Kleinberg,Robert Kleinberg

Proceedings of the fifteenth ACM conference on Economics and computation（2014）

引用 89|浏览71

暂无评分

摘要

We study a Bayesian multi-armed bandit (MAB) setting in which a principal seeks to maximize the sum of expected time-discounted rewards obtained by pulling arms, when the arms are actually pulled by selfish and myopic individuals. Since such individuals pull the arm with highest expected posterior reward (i.e., they always exploit and never explore), the principal must incentivize them to explore by offering suitable payments. Among others, this setting models crowdsourced information discovery and funding agencies incentivizing scientists to perform high-risk, high-reward research. We explore the tradeoff between the principal's total expected time-discounted incentive payments, and the total time-discounted rewards realized. Specifically, with a time-discount factor γ ∈ (0,1), let OPT denote the total expected time-discounted reward achievable by a principal who pulls arms directly in a MAB problem, without having to incentivize selfish agents. We call a pair (ρ,b) ∈ [0,1]2 consisting of a reward ρ and payment b achievable if for every MAB instance, using expected time-discounted payments of at most b•OPT, the principal can guarantee an expected time-discounted reward of at least ρ•OPT. Our main result is an essentially complete characterization of achievable (payment, reward) pairs: if √b+√1-ρ>√γ, then (ρ,b) is achievable, and if √b+√1-ρ In proving this characterization, we analyze so-called time-expanded policies, which in each step let the agents choose myopically with some probability p, and incentivize them to choose \"optimally\" with probability 1-p. The analysis of time-expanded policies leads to a question that may be of independent interest: If the same MAB instance (without selfish agents) is considered under two different time-discount rates γ > η, how small can the ratio of OPTη to OPTγ be? We give a complete answer to this question, showing that OPTη ≥ (1-γ)2/(1-η)2 • OPTγ, and that this bound is tight.

查看译文

关键词

crowdsourcing,economics,exploration,incentives,multi-armed bandit problems

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要