Transferable Reinforcement Learning via Generalized Occupancy Models
arxiv(2024)
摘要
Intelligent agents must be generalists - showing the ability to quickly adapt
and generalize to varying tasks. Within the framework of reinforcement learning
(RL), model-based RL algorithms learn a task-agnostic dynamics model of the
world, in principle allowing them to generalize to arbitrary rewards. However,
one-step models naturally suffer from compounding errors, making them
ineffective for problems with long horizons and large state spaces. In this
work, we propose a novel class of models - generalized occupancy models (GOMs)
- that retain the generality of model-based RL while avoiding compounding
error. The key idea behind GOMs is to model the distribution of all possible
long-term outcomes from a given state under the coverage of a stationary
dataset, along with a policy that realizes a particular outcome from the given
state. These models can then quickly be used to select the optimal action for
arbitrary new tasks, without having to redo policy optimization. By directly
modeling long-term outcomes, GOMs avoid compounding error while retaining
generality across arbitrary reward functions. We provide a practical
instantiation of GOMs using diffusion models and show its efficacy as a new
class of transferable models, both theoretically and empirically across a
variety of simulated robotics problems. Videos and code at
https://weirdlabuw.github.io/gom/.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要