Prioritized experience replay in DDPG via multi-dimensional transition priorities calculation

Nuo Cheng,Peng Wang,Guangyuan Zhang,Cui Ni, Hui Gong

crossref(2022)

引用 0|浏览0
暂无评分
摘要
Abstract The path planning algorithm of intelligent robot based on DDPG uses uniform random experience replay mechanism, cannot distinguish the importance of experience samples to the algorithm training process, and has some problems, such as unreasonable sampling of experience transitions and excessive use of edge experience, which lead to slow convergence speed and low success rate of path planning. In this paper, The priorities of experience transition based on the immediate reward, temporal-difference (TD) error and the loss function of Actor network are calculated respectively, and the information entropy is used as the weight to fuse the three priorities as the final priority. Furthermore, in order to effectively use the positive experience transitions and ensure the diversity of experience transitions, a method of increasing and decreasing the priority of positive experience transition is proposed. Finally, the sampling probability is calculated according to the priority of experience transition. The experimental results show that our proposed prioritized experience replay can not only improve the utilization rate of experience transitions and accelerate the convergence speed of DDPG, but also effectively improve the success rate of path planning, so as to provide a better guarantee for the robot to safely reach the target point.
更多
查看译文
关键词
prioritized experience replay,ddpg,multi-dimensional
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要