Out-of-Distribution Adaptation in Offline RL: Counterfactual Reasoning via Causal Normalizing Flows
arxiv(2024)
摘要
Despite notable successes of Reinforcement Learning (RL), the prevalent use
of an online learning paradigm prevents its widespread adoption, especially in
hazardous or costly scenarios. Offline RL has emerged as an alternative
solution, learning from pre-collected static datasets. However, this offline
learning introduces a new challenge known as distributional shift, degrading
the performance when the policy is evaluated on scenarios that are
Out-Of-Distribution (OOD) from the training dataset. Most existing offline RL
resolves this issue by regularizing policy learning within the information
supported by the given dataset. However, such regularization overlooks the
potential for high-reward regions that may exist beyond the dataset. This
motivates exploring novel offline learning techniques that can make
improvements beyond the data support without compromising policy performance,
potentially by learning causation (cause-and-effect) instead of correlation
from the dataset. In this paper, we propose the MOOD-CRL (Model-based Offline
OOD-Adapting Causal RL) algorithm, which aims to address the challenge of
extrapolation for offline policy training through causal inference instead of
policy-regularizing methods. Specifically, Causal Normalizing Flow (CNF) is
developed to learn the transition and reward functions for data generation and
augmentation in offline policy evaluation and training. Based on the
data-invariant, physics-based qualitative causal graph and the observational
data, we develop a novel learning scheme for CNF to learn the quantitative
structural causal model. As a result, CNF gains predictive and counterfactual
reasoning capabilities for sequential decision-making tasks, revealing a high
potential for OOD adaptation. Our CNF-based offline RL approach is validated
through empirical evaluations, outperforming model-free and model-based methods
by a significant margin.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要