Policy Gradient Search: Online Planning and Expert Iteration without Search Trees

arXiv: Learning, 2019.

Cited by: 5|Bibtex|Views94
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com|arxiv.org

Abstract:

Monte Carlo Tree Search (MCTS) algorithms perform simulation-based search to improve policies online. During search, the simulation policy is adapted to explore the most promising lines of play. MCTS has been used by state-of-the-art programs for many problems, however a disadvantage to MCTS is that it estimates the values of states with ...More

Code:

Data:

Full Text
Your rating :
0

 

Tags
Comments