Bandit Learning in Convex Non-Strictly Monotone Games

arxiv(2022)

引用 0|浏览12
暂无评分
摘要
We address learning Nash equilibria in convex games under the payoff information setting. In this setting, each agent does not know the functional form of her objective and can only receive feedback on the evaluation of her objective function at a feasible action profile chosen by her and other players. We consider the case in which the game pseudo-gradient is monotone but not necessarily strictly monotone. This relaxation of strict monotonicity enables application of learning algorithms to a larger class of games, such as a zero-sum game with a non-strictly convex-concave cost function. We derive an algorithm with provable convergence to Nash equilibria in this setting. While characterizing the convergence rate of the payoff-based algorithm in a non-strongly monotone game is challenging, we view the game as an instance of bandit online optimization. Through this lens, we quantify the regret rate of the algorithm and provide an approach to choose the algorithm's parameters to ensure minimal regret rate while converging to a Nash equilibrium.
更多
查看译文
关键词
bandit learning,monotone games,non-strictly
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要