Adaptive estimation Q-learning with uncertainty and familiarity

IJCAI 2023(2023)

引用 1|浏览32
暂无评分
摘要
One of the key problems in model-free deep reinforcement learning is how to obtain more accurate value estimations. Current most widely-used off-policy algorithms suffer from over- or underestimation bias which may lead to unstable policy. In this paper, we propose a novel method, Adaptive Estimation Q-learning (AEQ), which uses uncertainty and familiarity to control the value estimation naturally and can adaptively change for specific stateaction pair. We theoretically prove the property of our familiarity term which can even keep the expected estimation bias approximate to 0, and experimentally demonstrate our dynamic estimation can improve the performance and prevent the bias continuously increasing. We evaluate AEQ on several continuous control tasks, outperforming state-of-the-art performance. Moreover, AEQ is simple to implement and can be applied in any off-policy actor-critic algorithm.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要