Computational Complexity of Asynchronous Policy Iteration for Two-Player Zero-Sum Markov Games

Chenyu Xu,Sihai Zhang, Zhengdao Wang

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2024)

引用 0|浏览0
暂无评分
摘要
Bertsekas recently proposed Asynchronous Policy Iteration (API) as an alternative algorithm of Policy Iteration (PI) for solving the problem of two-player zero-sum Markov games. To quantifying the benefits of API, besides its flexibility for parallel and asynchronous implementation, the focus of this paper is to derive the computational complexity of API. We show that to reach within ϵ error to the optimal value function, the computational complexity of API is at most O (poly (n, m 1 , m 2 , ln(1/(1 − γ))), where n is the number of states, m 1 , m 2 are the number of actions for player 1 and player 2 respectively, and γ is the discount factor.
更多
查看译文
关键词
Computational complexity,asynchronous policy iteration,two-player zero-sum Markov games
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要