Sequential and Dynamic constraint Contrastive Learning for Reinforcement Learning

2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)(2021)

引用 0|浏览43
暂无评分
摘要
Contrastive unsupervised learning gives remarkable promise for sample-efficiency improvement in reinforcement learning, especially for high-dimensional observations by extracting latent features from raw inputs. However, prior works scarcely take sequential information and the knowledge of dynamic transitions into consideration when constructing contrastive samples. In this paper, we propose Sequential and Dynamic constraint Contrastive Reinforcement Learning (SDCRL) to improve the sample efficiency in high-dimensional inputs (e.g., images) setting. We firstly construct a sequential contrastive module to extract latent features with sequential information from raw correlated image inputs. Furthermore, we add a dynamic transition classification module to extract the knowledge of state transitions. We validate the proposed method in low sample regime (few interactions). Our algorithm surpasses prior pixel-based approaches on complex tasks in Deepmind Control Suite and even achieves or exceeds the performance of the method that uses state-based features as inputs on 11 out of 15 tasks. In Atari2600 games, SDCRL also outperforms strong baselines and achieves state-of-the-art performance on 7 out of 26 games.
更多
查看译文
关键词
Reinforcement Learning, Sample Efficiency, Sequential Contrastive Learning, Dynamic Transition Classification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要