Robust and Performance Incentivizing Algorithms for Multi-Armed Bandits with Strategic Agents
CoRR(2023)
摘要
We consider a variant of the stochastic multi-armed bandit problem.
Specifically, the arms are strategic agents who can improve their rewards or
absorb them. The utility of an agent increases if she is pulled more or absorbs
more of her rewards but decreases if she spends more effort improving her
rewards. Agents have heterogeneous properties, specifically having different
means and able to improve their rewards up to different levels. Further, a
non-empty subset of agents are ''honest'' and in the worst case always give
their rewards without absorbing any part. The principal wishes to obtain a high
revenue (cumulative reward) by designing a mechanism that incentives top level
performance at equilibrium. At the same time, the principal wishes to be robust
and obtain revenue at least at the level of the honest agent with the highest
mean in case of non-equilibrium behaviour. We identify a class of MAB
algorithms which we call performance incentivizing which satisfy a collection
of properties and show that they lead to mechanisms that incentivize top level
performance at equilibrium and are robust under any strategy profile.
Interestingly, we show that UCB is an example of such a MAB algorithm. Further,
in the case where the top performance level is unknown we show that combining
second price auction ideas with performance incentivizing algorithms achieves
performance at least at the second top level while also being robust.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要