谷歌浏览器插件
订阅小程序
在清言上使用

Partially Unsupervised Deep Meta-Reinforcement Learning

semanticscholar(2021)

引用 0|浏览0
暂无评分
摘要
Reinforcement Learning (RL) recently has been applied successfully to many domains. One particular example is the continuous control domain, where RL has been shown to solve complex tasks in simulation. However, it turns out that agents trained in simulation can usually not directly be transferred to real systems, and training directly using a real system is considered infeasible due to the large number of samples required. This work focuses on the simulation-to-reality (sim-to-real) transfer technique called domain randomization and shows that it closely relates to Meta-Reinforcement Learning (meta-RL). Based on this assessment, we propose a novel algorithm that is based on recent advancements in deep meta-RL to expand on the capabilities of domain randomization. In particular, we show that domain randomization faces issues with wide domain parameter distributions and demonstrate that our Partially Unsupervised Deep Meta-Reinforcement Learning (PUDM-RL) approach can handle these scenarios well. Furthermore, we improve the training stability by splitting the training into two separate parts. In one case, we train a dynamics encoder using unsupervised learning, while in the other case, we use the dynamics embedding from the learned dynamics encoder to aid the training of the agent. Experimentally we show that the learned dynamics encoder exhibits an informative latent space by compressing an interaction history from a particular environment into an embedding characterizing its dynamics. We find that agents aided by the dynamics embeddings perform much better than alternative approaches in the case of domain randomization with broad distributions over the dynamics parameters.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要