Multi-Agent Multi-Armed Bandits with Limited Communication

Mridul Agarwal,Vaneet Aggarwal,Kamyar Azizzadenesheli

arxiv（2022）

引用 2|浏览1

暂无评分

摘要

We consider the problem where N agents collaboratively interact with an instance of a stochastic K arm bandit problem for K >> N. The agents aim to simultaneously minimize the cumulative regret over all the agents for a total of T time steps, the number of communication rounds, and the number of bits in each communication round. We present Limited Communication Collaboration - Upper Confidence Bound (LCC-UCB), a doubling-epoch based algorithm where each agent communicates only after the end of the epoch and shares the index of the best arm it knows. With our algorithm, LCC-UCB, each agent enjoys a regret of (O) over tilde (root K/N + N)T), communicates for O(log T) steps and broadcasts O (log K) bits in each communication step. We extend the work to sparse graphs with maximum degree K-G and diameter D to propose LCC-UCB-GRAPH which enjoys a regret bound of (O) over tilde (D root K/N + K-G)DT). Finally, we empirically show that the LCC-UCB and the LCC-UCB-GRAPH algorithms perform well and outperform strategies that communicate through a central node.

查看译文

关键词

limited communication,multi-agent,multi-armed

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要