Minimizing Weighted Counterfactual Regret with Optimistic Online Mirror Descent
arxiv(2024)
摘要
Counterfactual regret minimization (CFR) is a family of algorithms for
effectively solving imperfect-information games. It decomposes the total regret
into counterfactual regrets, utilizing local regret minimization algorithms,
such as Regret Matching (RM) or RM+, to minimize them. Recent research
establishes a connection between Online Mirror Descent (OMD) and RM+, paving
the way for an optimistic variant PRM+ and its extension PCFR+. However, PCFR+
assigns uniform weights for each iteration when determining regrets, leading to
substantial regrets when facing dominated actions. This work explores
minimizing weighted counterfactual regret with optimistic OMD, resulting in a
novel CFR variant PDCFR+. It integrates PCFR+ and Discounted CFR (DCFR) in a
principled manner, swiftly mitigating negative effects of dominated actions and
consistently leveraging predictions to accelerate convergence. Theoretical
analyses prove that PDCFR+ converges to a Nash equilibrium, particularly under
distinct weighting schemes for regrets and average strategies. Experimental
results demonstrate PDCFR+'s fast convergence in common imperfect-information
games. The code is available at https://github.com/rpSebastian/PDCFRPlus.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要