Maximum Entropy Monte-Carlo Planning
NeurIPS, pp. 9516-9524, 2019.
We develop a new algorithm for online planning in large scale sequential decision problems that improves upon the worst case efficiency of UCT. The idea is to augment Monte-Carlo Tree Search (MCTS) with maximum entropy policy optimization , evaluating each search node by softmax values back-propagated from simulation. To establish the eff...More
PPT (Upload PPT)