谷歌浏览器插件
订阅小程序
在清言上使用

MICRO: Model-Based Offline Reinforcement Learning with a Conservative Bellman Operator

IJCAI 2024(2024)

引用 0|浏览12
暂无评分
摘要
Offline reinforcement learning (RL) faces a significant challenge ofdistribution shift. Model-free offline RL penalizes the Q value forout-of-distribution (OOD) data or constrains the policy closed to the behaviorpolicy to tackle this problem, but this inhibits the exploration of the OODregion. Model-based offline RL, which uses the trained environment model togenerate more OOD data and performs conservative policy optimization withinthat model, has become an effective method for this problem. However, thecurrent model-based algorithms rarely consider agent robustness whenincorporating conservatism into policy. Therefore, the new model-based offlinealgorithm with a conservative Bellman operator (MICRO) is proposed. This methodtrades off performance and robustness via introducing the robust Bellmanoperator into the algorithm. Compared with previous model-based algorithms withrobust adversarial models, MICRO can significantly reduce the computation costby only choosing the minimal Q value in the state uncertainty set. Extensiveexperiments demonstrate that MICRO outperforms prior RL algorithms in offlineRL benchmark and is considerably robust to adversarial perturbations.
更多
查看译文
关键词
Machine Learning -> ML: Reinforcement learning,Machine Learning -> ML: Model-based and model learning reinforcement learning,Machine Learning -> ML: Offline reinforcement learning,Machine Learning -> ML: Robustness
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要