Provably Efficient Information-Directed Sampling Algorithms for Multi-Agent Reinforcement Learning
arxiv(2024)
摘要
This work designs and analyzes a novel set of algorithms for multi-agent
reinforcement learning (MARL) based on the principle of information-directed
sampling (IDS). These algorithms draw inspiration from foundational concepts in
information theory, and are proven to be sample efficient in MARL settings such
as two-player zero-sum Markov games (MGs) and multi-player general-sum MGs. For
episodic two-player zero-sum MGs, we present three sample-efficient algorithms
for learning Nash equilibrium. The basic algorithm, referred to as MAIDS,
employs an asymmetric learning structure where the max-player first solves a
minimax optimization problem based on the joint information ratio of the joint
policy, and the min-player then minimizes the marginal information ratio with
the max-player's policy fixed. Theoretical analyses show that it achieves a
Bayesian regret of tildeO(sqrtK) for K episodes. To reduce the
computational load of MAIDS, we develop an improved algorithm called Reg-MAIDS,
which has the same Bayesian regret bound while enjoying less computational
complexity. Moreover, by leveraging the flexibility of IDS principle in
choosing the learning target, we propose two methods for constructing
compressed environments based on rate-distortion theory, upon which we develop
an algorithm Compressed-MAIDS wherein the learning target is a compressed
environment. Finally, we extend Reg-MAIDS to multi-player general-sum MGs and
prove that it can learn either the Nash equilibrium or coarse correlated
equilibrium in a sample efficient manner.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要