Randomized Exploration in Cooperative Multi-Agent Reinforcement Learning
CoRR(2024)
摘要
We present the first study on provably efficient randomized exploration in
cooperative multi-agent reinforcement learning (MARL). We propose a unified
algorithm framework for randomized exploration in parallel Markov Decision
Processes (MDPs), and two Thompson Sampling (TS)-type algorithms, CoopTS-PHE
and CoopTS-LMC, incorporating the perturbed-history exploration (PHE) strategy
and the Langevin Monte Carlo exploration (LMC) strategy respectively, which are
flexible in design and easy to implement in practice. For a special class of
parallel MDPs where the transition is (approximately) linear, we theoretically
prove that both CoopTS-PHE and CoopTS-LMC achieve a
𝒪(d^3/2H^2√(MK)) regret bound with communication
complexity 𝒪(dHM^2), where d is the feature
dimension, H is the horizon length, M is the number of agents, and K is
the number of episodes. This is the first theoretical result for randomized
exploration in cooperative MARL. We evaluate our proposed method on multiple
parallel RL environments, including a deep exploration problem (i.e.,
N-chain), a video game, and a real-world problem in energy systems. Our
experimental results support that our framework can achieve better performance,
even under conditions of misspecified transition models. Additionally, we
establish a connection between our unified framework and the practical
application of federated learning.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要