Offline Reinforcement Learning with Diffusion-Based Behavior Cloning Term.

KSEM (4)(2023)

引用 0|浏览1
暂无评分
摘要
To address the distributional shift problem in offline reinforcement learning, policy constraint methods aim to minimize the divergence between the current policy and the behavior policy. One type of policy constraint method is the regularization constraint method, which adds regularization terms to online reinforcement learning algorithms. However, some of these regularization terms may be too restrictive and limited by the expressive power of the generative model. To relax the strict distribution matching constraint, this paper proposes TD3+diffusion-based BC algorithm, which contains a behavior cloning term incorporating the diffusion model as a regularization constraint term. The diffusion model has a strong expressive power and can achieve support set matching, which means it can learn actions with a high probability in a given state and avoid actions outside of the distribution. Our algorithm matches or surpasses state-of-the-art algorithms on most tasks in the D4RL benchmark, as shown by the experimental results.
更多
查看译文
关键词
learning,behavior,diffusion-based
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要