Policy Evaluation Using the \Omega -Return

Philip S. Thomas,Scott Niekum,Georgios Theocharous,George Konidaris

neural information processing systems（2015）

引用 23|浏览39

暂无评分

摘要

We propose the Omega-return as an alternative to the Omega-return currently used by the TD (lambda) family of algorithms. The benefit of the Omega-return is that it accounts for the correlation of different length returns. Because it is difficult to compute exactly, we suggest one way of approximating the Omega-return. We provide empirical studies that suggest that it is superior to the lambda-return and gamma-return for a variety of problems.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要