Improving Pac Exploration Using The Median Of Means

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016)(2016)

引用 23|浏览21
暂无评分
摘要
We present the first application of the median of means in a PAC exploration algorithm for MDPs. Using the median of means allows us to significantly reduce the dependence of our bounds on the range of values that the value function can take, while introducing a dependence on the (potentially much smaller) variance of the Bellman operator. Additionally, our algorithm is the first algorithm with PAC bounds that can be applied to MDPs with unbounded rewards.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要