Multitask Bandit Learning through Heterogeneous Feedback Aggregation
AISTATS, pp. 1531-1539, 2021.
In many real-world applications, multiple agents seek to learn how to perform highly related yet slightly different tasks in an online bandit learning protocol. We formulate this problem as the $\epsilon$-multi-player multi-armed bandit problem, in which a set of players concurrently interact with a set of arms, and for each arm, the re...More
PPT (Upload PPT)