Promoting Quality and Diversity in Population-based Reinforcement Learning via Hierarchical Trajectory Space Exploration

IEEE International Conference on Robotics and Automation(2022)

引用 4|浏览84
暂无评分
摘要
Quality Diversity (QD) algorithms in population-based reinforcement learning aim to optimize agents' returns and diversity among the population simultaneously. It is conducive to solving exploration problems in reinforcement learning and potentially getting multiple good and diverse strategies. However, previous methods typically define behavioral embedding in action space or outcome space, which neglect trajectory characteristics during the execution process. In this paper, we introduce a trajectory embedding model trained by Variational Autoencoder with similarity constraint to characterize trajectory features. Based on that, we propose a hierarchical trajectory-space exploration (HTSE) framework using Determinantal Point Processes (DPP) to generate high-quality and diverse solutions in the selection and mutation process. The experimental results show that our HTSE method effectively completes several simulated tasks, outperforming other Quality-Diversity Reinforcement Learning algorithms.
更多
查看译文
关键词
reinforcement learning,exploration,diversity,trajectory,population-based
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要