Variational Dynamic for Self-Supervised Exploration in Deep Reinforcement Learning
arxiv(2020)
摘要
Efficient exploration remains a challenging problem in reinforcement
learning, especially for tasks where extrinsic rewards from environments are
sparse or even totally disregarded. Significant advances based on intrinsic
motivation show promising results in simple environments but often get stuck in
environments with multimodal and stochastic dynamics. In this work, we propose
a variational dynamic model based on the conditional variational inference to
model the multimodality and stochasticity. We consider the environmental
state-action transition as a conditional generative process by generating the
next-state prediction under the condition of the current state, action, and
latent variable, which provides a better understanding of the dynamics and
leads a better performance in exploration. We derive an upper bound of the
negative log-likelihood of the environmental transition and use such an upper
bound as the intrinsic reward for exploration, which allows the agent to learn
skills by self-supervised exploration without observing extrinsic rewards. We
evaluate the proposed method on several image-based simulation tasks and a real
robotic manipulating task. Our method outperforms several state-of-the-art
environment model-based exploration approaches.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要