Conservative network for offline reinforcement learning

Zhiyong Peng,Yadong Liu,Haoqiang Chen,Zongtan Zhou

Knowledge-Based Systems（2023）

引用 0|浏览11

暂无评分

摘要

Offline reinforcement learning (RL) aims to learn policies from static datasets. The value overestimation of out-of-distribution (OOD) actions makes it difficult to directly apply general RL methods in the offline setting. To overcome this problem, many works focus on estimating the value function conservatively or pessimistically. However, existing methods require additional OOD sampling or uncertainty estimation to underestimate OOD values, making them complex and vulnerable to hyperparameters. Is it possible to design a specific value function that can automatically be conservative on OOD samples? In this study, we reveal the anti-conservation property of the widely used ReLU network under certain conditions and explain the reason theoretically. Based on the analysis of the ReLU network, we propose a novel neural network architecture that pushes down the value of those samples far away from the datasets; we call this kind of new architecture the Conservative Network (ConsNet). Based on ConsNet, a new offline RL algorithm with simple implementation and high performance is proposed. Since we can obtain additional conservation from the ConsNet itself, by integrating the ConsNet into several existing offline RL methods, we find that it can significantly improve the performance or reduce the original algorithm complexity. With its simplicity and superiority, we hope that ConsNet could be a new fundamental network architecture for offline RL.

查看译文

关键词

Reinforcement learning,Offline reinforcement learning,OOD prediction,Activation functions,Ensemble methods

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要