Recurrent prediction model for partially observable MDPs

Information Sciences(2022)

引用 3|浏览6
暂无评分
摘要
•Temporal information is effectively integrated into the representation model.•A new prediction model is proposed to gain temporal information.•The memory capacity of the replay buffer is smaller than the existing methods.•The policy lag is proven to be decreased quickly in maximum-entropy reinforcement learning.
更多
查看译文
关键词
Reinforcement learning,Partially observable Markov decision processes,Autonomous system,Sequential decision system
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要