Self-adaptive PSRO: Towards an Automatic Population-based Game Solver
IJCAI 2024(2024)
Abstract
Policy-Space Response Oracles (PSRO) as a general algorithmic framework hasachieved state-of-the-art performance in learning equilibrium policies oftwo-player zero-sum games. However, the hand-crafted hyperparameter valueselection in most of the existing works requires extensive domain knowledge,forming the main barrier to applying PSRO to different games. In this work, wemake the first attempt to investigate the possibility of self-adaptivelydetermining the optimal hyperparameter values in the PSRO framework. Ourcontributions are three-fold: (1) Using several hyperparameters, we propose aparametric PSRO that unifies the gradient descent ascent (GDA) and differentPSRO variants. (2) We propose the self-adaptive PSRO (SPSRO) by casting thehyperparameter value selection of the parametric PSRO as a hyperparameteroptimization (HPO) problem where our objective is to learn an HPO policy thatcan self-adaptively determine the optimal hyperparameter values during therunning of the parametric PSRO. (3) To overcome the poor performance of onlineHPO methods, we propose a novel offline HPO approach to optimize the HPO policybased on the Transformer architecture. Experiments on various two-playerzero-sum games demonstrate the superiority of SPSRO over different baselines.
MoreTranslated text
Key words
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning,Game Theory and Economic Paradigms -> GTEP: Noncooperative games,Machine Learning -> ML: Game Theory,Machine Learning -> ML: Hyperparameter optimization
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined