Simple Ingredients for Offline Reinforcement Learning
arxiv(2024)
摘要
Offline reinforcement learning algorithms have proven effective on datasets
highly connected to the target downstream task. Yet, leveraging a novel testbed
(MOOD) in which trajectories come from heterogeneous sources, we show that
existing methods struggle with diverse data: their performance considerably
deteriorates as data collected for related but different tasks is simply added
to the offline buffer. In light of this finding, we conduct a large empirical
study where we formulate and test several hypotheses to explain this failure.
Surprisingly, we find that scale, more than algorithmic considerations, is the
key factor influencing performance. We show that simple methods like AWAC and
IQL with increased network size overcome the paradoxical failure modes from the
inclusion of additional data in MOOD, and notably outperform prior
state-of-the-art algorithms on the canonical D4RL benchmark.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要