Deep Reinforcement Learning via Past-Success Directed Exploration

AAAI(2019)

引用 2|浏览53
暂无评分
摘要
The balance between exploration and exploitation has always been a core challenge in reinforcement learning. This paper proposes "past-success exploration strategy combined with Softmax action selection"(PSE-Softmax) as an adaptive control method for taking advantage of the characteristics of the online learning process of the agent to adapt exploration parameters dynamically. The proposed strategy is tested on OpenAI Gym with discrete and continuous control tasks, and the experimental results show that PSE-Softmax strategy delivers better performance than deep reinforcement learning algorithms with basic exploration strategies.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要