MICRO: Model-Based Offline Reinforcement Learning with a Conservative Bellman Operator

Xiao-Yin Liu,Xiao-Hu Zhou,Guotao Li,Hao Li,Mei-Jiang Gui,Tian-Yu Xiang,De-Xing Huang,Zeng-Guang Hou

IJCAI 2024（2024）

引用 0|浏览12

暂无评分

摘要

Offline reinforcement learning (RL) faces a significant challenge ofdistribution shift. Model-free offline RL penalizes the Q value forout-of-distribution (OOD) data or constrains the policy closed to the behaviorpolicy to tackle this problem, but this inhibits the exploration of the OODregion. Model-based offline RL, which uses the trained environment model togenerate more OOD data and performs conservative policy optimization withinthat model, has become an effective method for this problem. However, thecurrent model-based algorithms rarely consider agent robustness whenincorporating conservatism into policy. Therefore, the new model-based offlinealgorithm with a conservative Bellman operator (MICRO) is proposed. This methodtrades off performance and robustness via introducing the robust Bellmanoperator into the algorithm. Compared with previous model-based algorithms withrobust adversarial models, MICRO can significantly reduce the computation costby only choosing the minimal Q value in the state uncertainty set. Extensiveexperiments demonstrate that MICRO outperforms prior RL algorithms in offlineRL benchmark and is considerably robust to adversarial perturbations.

查看译文

关键词

Machine Learning -> ML: Reinforcement learning,Machine Learning -> ML: Model-based and model learning reinforcement learning,Machine Learning -> ML: Offline reinforcement learning,Machine Learning -> ML: Robustness

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要