Safe Exploration Using Bayesian World Models and Log-Barrier Optimization
arxiv(2024)
摘要
A major challenge in deploying reinforcement learning in online tasks is
ensuring that safety is maintained throughout the learning process. In this
work, we propose CERL, a new method for solving constrained Markov decision
processes while keeping the policy safe during learning. Our method leverages
Bayesian world models and suggests policies that are pessimistic w.r.t. the
model's epistemic uncertainty. This makes CERL robust towards model
inaccuracies and leads to safe exploration during learning. In our experiments,
we demonstrate that CERL outperforms the current state-of-the-art in terms of
safety and optimality in solving CMDPs from image observations.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要