Efficient off-policy Q-learning for multi-agent systems by solving dual games

INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL(2024)

引用 0|浏览7
暂无评分
摘要
This article develops distributed optimal control policies via Q-learning for multi-agent systems (MASs) by solving dual games. According to game theory, first, the distributed consensus problem is formulated as a multi-player non-zero-sum game, where each agent is viewed as a player focusing only on its local performance and the whole MAS achieves Nash equilibrium. Second, for each agent, the anti-disturbance problem is formulated as a two-player zero-sum game, in which the control input and external disturbance are a pair of opponents. Specifically, (1) an offline data-driven off-policy for distributed tracking algorithm based on momentum policy gradient (MPG) is developed, which can effectively achieve consensus of MASs with guaranteed l2$$ {l}_2 $$-bounded synchronization error. (2) An actor-critic-disturbance neural network is employed to implement the MPG algorithm and obtain optimal policies. Finally, numerical and practical simulation results are conducted to verify the effectiveness of the developed tracking policies via MPG algorithm.
更多
查看译文
关键词
dual games,momentum policy gradient,multi-agent systems,off-policy
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要