Mastering construction heuristics with self-play deep reinforcement learning

NEURAL COMPUTING & APPLICATIONS(2022)

引用 3|浏览23
暂无评分
摘要
Learning heuristics without expert experience to construct solutions automatically has always been a critical challenge of combinatorial optimization. It is also the pursuit of artificial intelligence to construct an agent with the planning ability to solve multiple problems simultaneously. Nonetheless, most current learning-based methods for combinatorial optimization still rely on artificially designed heuristics. In real-world problems, the environment’s dynamics are often unknown and complex, making it challenging to generalize and implement current methods. Inspired by AlphaGo Zero, we propose a novel self-play reinforcement learning algorithm (CH-Zero) based on the Monte Carlo tree search (MCTS) for routing optimization problems in this paper. Like AlphaGo Zero, CH-Zero does not require expert experience but some necessary rules. However, unlike other self-play algorithms based on MCTS, we have designed offline training and online reasoning. Specifically, we apply self-play reinforcement learning without MCTS to train offline policy and value networks. Then, we apply the learned heuristics and neural network combined with an MCTS to make inferences on unknown instances. Since we did not incorporate MCTS during training, this is equivalent to training a lightweight self-playing framework whose learning efficiency is much higher than the existing self-play-based methods for combinatorial optimization. We can employ the learned heuristics to guide MCTS to improve policies and take better actions at runtime.
更多
查看译文
关键词
Combinatorial optimization,Deep learning,Reinforcement learning,Monte Carlo tree search
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要