One Practical Algorithm for Both Stochastic and Adversarial Bandits.
Science & Engineering Faculty(2014)
摘要
We present an algorithm for multiarmed bandits
that achieves almost optimal performance in both
stochastic and adversarial regimes without prior
knowledge about the nature of the environment.
Our algorithm is based on augmentation of the
EXP3 algorithm with a new control lever in the
form of exploration parameters that are tailored
individually for each arm. The algorithm simultaneously
applies the “old” control lever, the
learning rate, to control the regret in the adversarial
regime and the new control lever to detect
and exploit gaps between the arm losses. This
secures problem-dependent “logarithmic” regret
when gaps are present without compromising on
the worst-case performance guarantee in the adversarial
regime. We show that the algorithm can
exploit both the usual expected gaps between the
arm losses in the stochastic regime and deterministic
gaps between the arm losses in the adversarial
regime. The algorithm retains “logarithmic”
regret guarantee in the stochastic regime
even when some observations are contaminated
by an adversary, as long as on average the contamination
does not reduce the gap by more than
a half. Our results for the stochastic regime are
supported by experimental validation.
更多查看译文
关键词
adversarial bandits,stochastic,practical algorithm
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要