Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model

OPERATIONS RESEARCH(2024)

引用 10|浏览340
暂无评分
摘要
This paper is concerned with the sample efficiency of reinforcement learning, assuming access to a generative model (or simulator). We first consider gamma-discounted infinite-horizon Markov decision processes (MDPs) with state space S and action space A. Despite a number of prior works tackling this problem, a complete picture of the trade-offs between sample complexity and statistical accuracy has yet to be determined. In particular, all prior results suffer from a severe sample size barrier in the sense that their claimed statistical guarantees hold only when the sample size exceeds at least | S||A| /(1-gamma)(2). The current paper overcomes this barrier by certifying the minimax optimality of two algorithms-a perturbed model-based algorithm and a conservative model-based algorithm-as soon as the sample size exceeds the order of | S||A |/ 1-gamma (modulo some log factor). Moving beyond infinite-horizon MDPs, we further study time-inhomogeneous finite-horizon MDPs and prove that a plain model-based planning algorithm suffices to achieve minimax-optimal sample complexity given any target accuracy level. To the best of our knowledge, this work delivers the first minimax-optimal guarantees that accommodate the entire range of sample sizes (beyond which finding a meaningful policy is information theoretically infeasible).
更多
查看译文
关键词
model-based reinforcement learning,minimaxity,policy evaluation,generative model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要