Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning
CoRR(2024)
摘要
One of the notorious issues for Reinforcement Learning (RL) is poor sample
efficiency. Compared to single agent RL, the sample efficiency for Multi-Agent
Reinforcement Learning (MARL) is more challenging because of its inherent
partial observability, non-stationary training, and enormous strategy space.
Although much effort has been devoted to developing new methods and enhancing
sample efficiency, we look at the widely used episodic training mechanism. In
each training step, tens of frames are collected, but only one gradient step is
made. We argue that this episodic training could be a source of poor sample
efficiency. To better exploit the data already collected, we propose to
increase the frequency of the gradient updates per environment interaction
(a.k.a. Replay Ratio or Update-To-Data ratio). To show its generality, we
evaluate 3 MARL methods on 6 SMAC tasks. The empirical results validate
that a higher replay ratio significantly improves the sample efficiency for
MARL algorithms. The codes to reimplement the results presented in this paper
are open-sourced at https://anonymous.4open.science/r/rr_for_MARL-0D83/.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要