Model-Based Offline Reinforcement Learning with Uncertainty Estimation and Policy Constraint

IEEE Transactions on Artificial Intelligence(2024)

引用 0|浏览3
暂无评分
摘要
Explicit uncertainty estimation is an effective method for addressing the overestimation problem caused by distribution shifts in offline RL. However, the common bootstrapped ensemble network method fails to obtain reliable uncertainty estimation, which will decrease the performance of offline RL. Compared with model-free offline RL, model-based offline RL provides better generalizability although it is limited by the model-bias problem. The adverse effects of model bias will be aggravated by the state mismatch phenomenon which will ultimately disrupt policy learning. In this paper, we propose the Model-based Offline RL with Uncertainty estimation and Policy constraint (MOUP) algorithm to obtain reliable uncertainty estimation and bounded state mismatch. Firstly, we introduce MC dropout to ensemble networks and propose ensemble dropout networks for uncertainty estimation. Secondly, a novel policy constraint method is given that incorporates the maximum mean discrepancy constraint into policy optimization, and we prove that such a method can generate bounded state mismatch. Finally, we evaluate the MOUP algorithm on the MuJoCo control toolkit. Experimental results show that the proposed MOUP algorithm is competitive compared with existing offline RL algorithms.
更多
查看译文
关键词
Model-based offline reinforcement learning,uncertainty estimation,MC dropout,policy constraint
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要