Solving Poker Games Efficiently: Adaptive Memory based Deep Counterfactual Regret Minimization

Shuqing Shi,Xiaobin Wang,Dong Hao,Zhiyou Yang,Hong Qu

2022 International Joint Conference on Neural Networks (IJCNN)（2022）

引用 0|浏览21

暂无评分

摘要

Poker game has become one of the most prevailing benchmark environment to discover algorithms for sequential games with imperfect information (SGII). However, in games with large state space, it is hard to traverse the whole game tree. This is because the space of history is exponentially increasing with the input size of the game. Other attempts like truncating the game tree with certain length have also been made to solve this problem. But determine the most suitable length could require enormous amount of resources. All of these obstacles make algorithms for SGII much harder to design. To solve this kind of problem, we propose the adaptive memory sampling method which aims to find the distribution of the sampling length by using posterior sampling to update it iteratively. In the real-world human interaction, to what extent a human memory can last often varies significantly depending on the importance of the interaction trajectory. So we also adopted the Long Short-Term Memory (LSTM) network as the sub-procedure to classify the histories and making prediction of future game states and actions based on historical sampled data. According to our theoretical analysis, our method performs better than the state-of-the-art algorithms. On the other hand, The empirical results support our results.

查看译文

关键词

Imperfect information,Counterfactual Regret Minimization,regret bound,posterior sampling

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要