Classical Planning Algorithms on the Atari Video Games.
AAAI Workshop: Learning for General Competency in Video Games(2015)
摘要
The Atari 2600 games supported in the Arcade Learning Environment (Bellemare et al. 2013) all feature a known initial (RAM) state and actions that have deterministic effects. Classical planners, however, cannot be used for selecting actions for two reasons: first, no compact PDDL-model of the games is given, and more importantly, the action effects and goals are not known a priori. Moreover, in these games there is usually no set of goals to be achieved but rewards to be collected. These features do not preclude the use of classical algorithms like breadth-first search or Dijkstra’s algorithm, but these methods are not effective over large state spaces. We thus turn to a different class of classical planning algorithms introduced recently that perform a structured exploration of the state space; namely, like breadth-first search and Dijkstra’s algorithm they are “blind” and hence do not require prior knowledge of state transitions, costs (rewards) or goals, and yet, like heuristic search algorithms, they have been shown to be effective for solving problems over huge state spaces. The simplest such algorithm, called Iterated Width or IW, consists of a sequence of calls IW(1), IW(2), . . . , IW(k) where IW(i) is a breadth-first search in which a state is pruned when it is not the first state in the search to make true some subset of i atoms. The empirical results over 54 games suggest that the performance of IW with the k parameter fixed to 1, i.e., IW(1), is at the level of the state of the art represented by UCT. A simple best-first variation of IW that combines exploration and exploitation proves to be very competitive as well.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络