Near Optimal Adversarial Attacks on Stochastic Bandits and Defenses with Smoothed Responses
International Conference on Artificial Intelligence and Statistics(2020)
摘要
I study adversarial attacks against stochastic bandit algorithms. At each
round, the learner chooses an arm, and a stochastic reward is generated. The
adversary strategically adds corruption to the reward, and the learner is only
able to observe the corrupted reward at each round. Two sets of results are
presented in this paper. The first set studies the optimal attack strategies
for the adversary. The adversary has a target arm he wishes to promote, and his
goal is to manipulate the learner into choosing this target arm T - o(T)
times. I design attack strategies against UCB and Thompson Sampling that only
spend O(√(log T)) cost. Matching lower bounds are presented,
and the vulnerability of UCB, Thompson sampling, and ε-greedy are
exactly characterized. The second set studies how the learner can defend
against the adversary. Inspired by literature on smoothed analysis and
behavioral economics, I present two simple algorithms that achieve a
competitive ratio arbitrarily close to 1.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要