Policy Gradient Search: Online Planning and Expert Iteration without Search Trees
arXiv: Learning, 2019.
Monte Carlo Tree Search (MCTS) algorithms perform simulation-based search to improve policies online. During search, the simulation policy is adapted to explore the most promising lines of play. MCTS has been used by state-of-the-art programs for many problems, however a disadvantage to MCTS is that it estimates the values of states with ...More
PPT (Upload PPT)