Combinatorial Multi-Armed Bandits with Concave Rewards and Fairness Constraints

IJCAI 2020(2020)

引用 9|浏览120
暂无评分
摘要
The problem of multi-armed bandit (MAB) with fairness constraint has emerged as an important research topic recently. For such problems, one common objective is to maximize the total rewards within a fixed round of pulls, while satisfying the fairness requirement of a minimum selection fraction for each individual arm in the long run. Previous works have made substantial advancements in designing efficient online selection solutions, however, they fail to achieve a sublinear regret bound when incorporating such fairness constraints. In this paper, we study a combinatorial MAB problem with concave objective and fairness constraints. In particular, we adopt a new approach that combines online convex optimization with bandit methods to design selection algorithms. Our algorithm is computationally efficient, and more importantly, manages to achieve a sublinear regret bound with probability guarantees. Finally, we evaluate the performance of our algorithm via extensive simulations and demonstrate that it outperforms the baselines substantially.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络