Unlocking the Power of Representations in Long-term Novelty-based Exploration

Alaa Saade,Steven Kapturowski,Daniele Calandriello,Charles Blundell,Pablo Sprechmann,Leopoldo Sarra,Oliver Groth,Michal Valko,Bilal Piot

arXiv (Cornell University)（2023）

引用 0|浏览73

暂无评分

摘要

We introduce Robust Exploration via Clustering-based Online Density Estimation (RECODE), a non-parametric method for novelty-based exploration that estimates visitation counts for clusters of states based on their similarity in a chosen embedding space. By adapting classical clustering to the nonstationary setting of Deep RL, RECODE can efficiently track state visitation counts over thousands of episodes. We further propose a novel generalization of the inverse dynamics loss, which leverages masked transformer architectures for multi-step prediction; which in conjunction with RECODE achieves a new state-of-the-art in a suite of challenging 3D-exploration tasks in DM-Hard-8. RECODE also sets new state-of-the-art in hard exploration Atari games, and is the first agent to reach the end screen in "Pitfall!".

查看译文

关键词

exploration,representations,long-term,novelty-based

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要