Parallel Multi-Environment Shaping Algorithm for Complex Multi-step Task

Neurocomputing(2020)

引用 5|浏览67
暂无评分
摘要
Because of the sparse reward and the sequence of the complex multi-step task, there is a big challenge in reinforcement learning, that is, the agent needs to implement several consecutive sequential steps to complete the whole task without the intermediate reward. Reward Shaping and Curriculum Learning algorithms are always used to solve this challenge, but Reward Shaping is prone to sub-optimal policies and Curriculum Learning easily suffers from the catastrophic forgetting problem. In this paper, we propose a novel algorithm called Parallel Multi-Environment Shaping (PMES), where several sub-environments are built based on human knowledge to make the agent aware of the importance of intermediate steps, each of which corresponds to a key intermediate step. Specifically, the learning agent is trained under these parallel multiple environments including the original environment and several sub-environments by synchronous advantage actor-critic algorithm. And the PMES algorithm has the mechanism of adaptive reward shaping to adjust the reward function. In this way, PMES algorithm effectively incorporates human experience by multiple different environments rather than only shaping the reward function, which combines the benefits of Reward Shaping and Curriculum Learning algorithms while avoiding their drawbacks. Extensive experiments on the mini-game ‘Build Marines’ of StarCraft II environment show that our proposed algorithm is more effective than Reward Shaping, Curriculum Learning, and PLAID algorithms, which is almost close to the level of human Grandmaster. And compared with the existing work, it takes less time and computing resources to reach a good result.
更多
查看译文
关键词
Reinforcement Learning,Multi-step Task,Parallel Multiple Environments,Adaptive Reward Shaping
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要