Consistency Models as a Rich and Efficient Policy Class for Reinforcement Learning
arxiv(2023)
摘要
Score-based generative models like the diffusion model have been testified to
be effective in modeling multi-modal data from image generation to
reinforcement learning (RL). However, the inference process of diffusion model
can be slow, which hinders its usage in RL with iterative sampling. We propose
to apply the consistency model as an efficient yet expressive policy
representation, namely consistency policy, with an actor-critic style algorithm
for three typical RL settings: offline, offline-to-online and online. For
offline RL, we demonstrate the expressiveness of generative models as policies
from multi-modal data. For offline-to-online RL, the consistency policy is
shown to be more computational efficient than diffusion policy, with a
comparable performance. For online RL, the consistency policy demonstrates
significant speedup and even higher average performances than the diffusion
policy.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要