Past, Present, And Future: An Optimal Online Algorithm For Single-Player Gdl-Ii Games

ECAI'14: Proceedings of the Twenty-first European Conference on Artificial Intelligence(2014)

引用 2|浏览14
暂无评分
摘要
In General Game Playing, a player receives the rules of an unknown game and attempts to maximize his expected reward. Since 2011, the GDL-II rule language extension allows the formulation of nondeterministic and partially observable games. In this paper, we present an algorithm for such games, with a focus on the single-player case. Conceptually, at each stage, the proposed NORNS algorithm distinguishes between the past, present and future steps of the game. More specifically, a belief state tree is used to simulate a potential past that leads to a present that is consistent with received observations. Unlike other related methods, our method is asymptotically optimal. Moreover, augmenting the belief state tree with iteratively improved probabilities speeds up the process over time significantly.As this allows a true picture of the present, we additionally present an optimal version of the well-known UCT algorithm for partially observable single-player games. Instead of performing hindsight optimization on a simplified, fully observable tree, the true future is simulated on an action-observation tree that takes partial observability into account. The expected reward estimates of applicable actions converge towards the true expected rewards even for moves that are only used to gather information. We prove that our algorithm is asymptotically optimal for single-player games and POMDPs and support our claim with an empirical evaluation.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要