Approximate Relative Value Learning for Average-reward Continuous State MDPs

Hiteshi Sharma,Mehdi Jafarnia-Jahromi,Rahul Jain

UAI（2020）

引用 23|浏览10

暂无评分

摘要

In this paper, we propose an approximate relative value learning (ARVL) algorithm for nonparametric MDPs with continuous state space and finite actions and average reward criterion. It is a sampling based algorithm combined with kernel density estimation and function approximation via nearest neighbors. The theoretical analysis is done via a random contraction operator framework and stochastic dominance argument. This is the first such algorithm for continuous state space MDPs with average reward criteria with these provable properties which does not require any discretization of state space as far as we know. We then evaluate the proposed algorithm on a benchmark problem numerically.

查看译文

关键词

approximate relative value learning,average-reward

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要