Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning
NIPS 2020(2020)
摘要
We study how to use unsupervised learning for efficient exploration in reinforcement learning with rich observations generated from a small number of latent states. We present a novel algorithmic framework that is built upon two components: an unsupervised learning algorithm and a no-regret reinforcement learning algorithm. We show that our algorithm provably finds a near-optimal policy with sample complexity polynomial in the number of latent states, which is significantly smaller than the number of possible observations. Our result gives theoretical justification to the prevailing paradigm of using unsupervised learning for efficient exploration [tang2017exploration,bellemare2016unifying].
更多查看译文
关键词
Unsupervised learning,Reinforcement learning,Markov decision process,Polynomial,Class (biology),Artificial intelligence,Computer science,Sample complexity
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络