Model-based Reinforcement Learning for Parameterized Action Spaces
arxiv(2024)
摘要
We propose a novel model-based reinforcement learning algorithm – Dynamics
Learning and predictive control with Parameterized Actions (DLPA) – for
Parameterized Action Markov Decision Processes (PAMDPs). The agent learns a
parameterized-action-conditioned dynamics model and plans with a modified Model
Predictive Path Integral control. We theoretically quantify the difference
between the generated trajectory and the optimal trajectory during planning in
terms of the value they achieved through the lens of Lipschitz Continuity. Our
empirical results on several standard benchmarks show that our algorithm
achieves superior sample efficiency and asymptotic performance than
state-of-the-art PAMDP methods.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要