Multi-critic DDPG Method and Double Experience Replay

2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC)(2018)

引用 18|浏览28
暂无评分
摘要
The remarkable Deep Deterministic Policy Gradient (DDPG) reinforcement learning method commonly consists of actor learning and critic learning. The actor learning highly relies on the critic learning, which makes the performance of DDPG method rather sensitive to critic learning and leads to stability issues. To further improve the stability and performance of DDPG method, the multi-critic DDPG method (MCDDPG) is proposed for a reliable critic learning. The average value of multiple critics is used to replace the single critic in DDPG method for better resistance when one critic performs badly, and multiple independent critics can learn knowledges from environment more widely. Besides, an extension of experience replay mechanism is revealed for accelerating the training process. All the methods are tested on simulated environments in OpenAI Gym platform, and convincing experiment results are obtained to support the proposed methods.
更多
查看译文
关键词
multiple independent critics,multicritic DDPG method,actor learning,reliable critic learning,single critic,remarkable deep deterministic policy gradient reinforcement learning method,OpenAI Gym platform
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要