Contextual Games: Multi-Agent Learning with Side Information

NIPS 2020, 2020.

Cited by: 0|Bibtex|Views90
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com
Weibo:
The obtained results were validated in a traffic routing experiment, where our algorithms led to reduced travel times and more efficient outcomes compared to other baselines that do not exploit the observed contexts or the correlation present in the game

Abstract:

We formulate the novel class of contextual games, a type of repeated games driven by contextual information at each round. By means of kernel-based regularity assumptions, we model the correlation between different contexts and game outcomes and propose a novel online (meta) algorithm that exploits such correlations to minimize the contex...More

Code:

Data:

0
Introduction
  • Several important real-world problems, ranging from economics, engineering, and computer science involve multiple interactions of self-interested agents with coupled objectives.
  • An important line of research has focused, on the one hand, on characterizing game-theoretic equilibria and their efficiency and, on the other hand, on deriving fast learning algorithms that converge to equilibria and efficient outcomes.
  • Most of these results, are based on the assumption that the players always face the exact same game, repeated over time.
  • Players can observe such factors, and could take better decisions depending on the circumstances
Highlights
  • Several important real-world problems, ranging from economics, engineering, and computer science involve multiple interactions of self-interested agents with coupled objectives
  • We show that contextual Coarse Correlated Equilibria (c-CCE) and contextual welfare can be approached in a decentralized fashion whenever players minimize their contextual regrets, recovering important game-theoretic results for our larger class of games
  • Our algorithms effectively use the available contextual information to minimize agents’ travel times and converge to more efficient outcomes compared to other baselines that do not exploit the observed contexts and/or the correlations present in the game
  • We have introduced the class of contextual games, a type of repeated games described by contextual information at each round
  • Using kernel-based regularity assumptions, we modeled the correlation between different contexts and game outcomes, and proposed novel online algorithms that exploit such correlations to minimize the players’ contextual regret
  • We show that significantly improved guarantees are achievable when contexts are i.i.d. samples from a static distribution
  • The obtained results were validated in a traffic routing experiment, where our algorithms led to reduced travel times and more efficient outcomes compared to other baselines that do not exploit the observed contexts or the correlation present in the game
Results
  • The authors show that significantly improved guarantees are achievable when contexts are i.i.d. samples from a static distribution.
Conclusion
  • The authors have introduced the class of contextual games, a type of repeated games described by contextual information at each round.
  • The obtained results were validated in a traffic routing experiment, where the algorithms led to reduced travel times and more efficient outcomes compared to other baselines that do not exploit the observed contexts or the correlation present in the game.
  • Examples range from road traffic over auctions and financial markets, to robotic systems.
  • Understanding these interactions and their effects for individual participants and the reliability of the overall system becomes ever more important.
  • The authors believe the work contributes positively to this challenge by studying principled algorithms that are efficient, while converging to suitable, and often efficient, equilibria
Summary
  • Introduction:

    Several important real-world problems, ranging from economics, engineering, and computer science involve multiple interactions of self-interested agents with coupled objectives.
  • An important line of research has focused, on the one hand, on characterizing game-theoretic equilibria and their efficiency and, on the other hand, on deriving fast learning algorithms that converge to equilibria and efficient outcomes.
  • Most of these results, are based on the assumption that the players always face the exact same game, repeated over time.
  • Players can observe such factors, and could take better decisions depending on the circumstances
  • Results:

    The authors show that significantly improved guarantees are achievable when contexts are i.i.d. samples from a static distribution.
  • Conclusion:

    The authors have introduced the class of contextual games, a type of repeated games described by contextual information at each round.
  • The obtained results were validated in a traffic routing experiment, where the algorithms led to reduced travel times and more efficient outcomes compared to other baselines that do not exploit the observed contexts or the correlation present in the game.
  • Examples range from road traffic over auctions and financial markets, to robotic systems.
  • Understanding these interactions and their effects for individual participants and the reliability of the overall system becomes ever more important.
  • The authors believe the work contributes positively to this challenge by studying principled algorithms that are efficient, while converging to suitable, and often efficient, equilibria
Related work
  • Learning in repeated static games has been extensively studied in the literature. The seminal works [17, 18] show that simple no-regret strategies for the players converge to the set of Coarse Correlated Equilibria (CCEs), while the efficiency of such equilibria and learning dynamics has been studied in [6, 31]. Exploiting the static game structure, moreover, [38, 15] propose faster learning algorithms, and a long array of works (e.g., [35, 7, 4]) study convergence to Nash equilibria. Learning in time-varying games, instead, has been recently considered [14], where the authors show that dynamic regret minimization allows players to track the sequence of Nash equilibria, provided that the stage games are monotone and slowly-varying. Adversarially changing zero-sum games have

    34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada.

    also been studied [10], with convergence guarantees to the Nash equilibrium of the time-averaged game. Our contextual games model is fundamentally different than [14, 10] in that we assume players observe the current context (and hence have prior information on the game) before playing. This leads to new equilibria and a different performance benchmark, denoted as contextual regret, described by the best policies mapping contexts to actions. Perhaps closer to ours is the setup of stochastic (or Markov) games [34], at the core of multi-agent reinforcement learning (see [9] for an overview). There, players observe the state of the game before playing but, differently from our setup, the evolution of the state depends on the actions chosen at each round. This leads to a nested game structure, which requires significant computational power and players’ coordination to compute equilibrium strategies via backward induction [16, 13]. Instead, we consider arbitrary contexts’ sequences (potentially chosen by an adversarial Nature) and show that efficient algorithms converge to our equilibria in a decentralized fashion.
Funding
  • This work was gratefully supported by the Swiss National Science Foundation, under the grant SNSF 200021_172781, by the European Union’s ERC grant 815943, and ETH Zürich Postdoctoral Fellowship 19-2 FEL-47
Reference
  • Transportation Network Test Problems. http://www.bgu.ac.il/bargera/tntp/.
    Findings
  • Yasin Abbasi-Yadkori. Online learning for linearly parametrized control problems. 2013.
    Google ScholarFindings
  • Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 2003.
    Google ScholarLocate open access versionFindings
  • David Balduzzi, Sebastien Racaniere, James Martens, Jakob Foerster, Karl Tuyls, and Thore Graepel. The mechanics of n-player differentiable games. In International Conference on Machine Learning (ICML), 2018.
    Google ScholarLocate open access versionFindings
  • Santiago Balseiro, Negin Golrezaei, Mohammad Mahdian, Vahab Mirrokni, and Jon Schneider. Contextual bandits with cross-learning. In Advances in Neural Information Processing Systems (NeurIPS). 2019.
    Google ScholarLocate open access versionFindings
  • Avrim Blum, Mohammad Taghi Hajiaghayi, Katrina Ligett, and Aaron Roth. Regret minimization and the price of total anarchy. In Annual ACM Symposium on Theory of Computing (STOC), 2008.
    Google ScholarLocate open access versionFindings
  • Michael Bowling. Convergence and no-regret in multiagent learning. In Advances in Neural Information Processing Systems (NeurIPS). 2005.
    Google ScholarLocate open access versionFindings
  • S. Bubeck and N. Cesa-Bianchi. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. 2012.
    Google ScholarFindings
  • Lucian Busoniu, Robert Babuška, and Bart De Schutter. Multi-agent Reinforcement Learning: An Overview. Springer Berlin Heidelberg, 2010.
    Google ScholarFindings
  • Adrian Rivera Cardoso, Jacob Abernethy, He Wang, and Huan Xu. Competing against Nash equilibria in adversarially changing zero-sum games. In International Conference on Machine Learning (ICML), 2019.
    Google ScholarLocate open access versionFindings
  • Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge University Press, 2006.
    Google ScholarFindings
  • Nando de Freitas, Alex Smola, and Masrour Zoghi. Regret bounds for deterministic Gaussian process bandits. arXiv preprint arXiv:1203.2177, 2012.
    Findings
  • Liam M. Dermed and Charles L. Isbell. Solving stochastic games. In Advances in Neural Information Processing Systems (NeurIPS). 2009.
    Google ScholarLocate open access versionFindings
  • Benoit Duvocelle, Panayotis Mertikopoulos, Mathias Staudigl, and Dries Vermeulen. Learning in time-varying games. arXiv preprint arXiv:1809.03066, 2018.
    Findings
  • Dylan J. Foster, Zhiyuan Li, Thodoris Lykouris, Karthik Sridharan, and Éva Tardos. Learning in games: Robustness of fast convergence. In Advances in Neural Information Processing Systems (NeurIPS), 2016.
    Google ScholarLocate open access versionFindings
  • Amy Greenwald and Keith Hall. Correlated-Q Learning. In International Conference on Machine Learning (ICML), 2003.
    Google ScholarLocate open access versionFindings
  • James Hannan. Approximation to Bayes risk in repeated play. Princeton University Press, 1957.
    Google ScholarFindings
  • Sergiu Hart and Andreu Mas-Colell. A simple adaptive procedure leading to correlated equilibrium. Econometrica, 2000.
    Google ScholarLocate open access versionFindings
  • Jason Hartline, Vasilis Syrgkanis, and Éva Tardos. No-Regret Learning in Bayesian Games. In Advances in Neural Information Processing Systems (NeurIPS), 2015.
    Google ScholarLocate open access versionFindings
  • Elad Hazan and Nimrod Megiddo. Online learning with prior knowledge. In Annual Conference on Learning Theory (COLT), 2007.
    Google ScholarLocate open access versionFindings
  • Wassily Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American statistical association, 1963.
    Google ScholarLocate open access versionFindings
  • Andreas Krause and Cheng S Ong. Contextual Gaussian process bandit optimization. In Advances in Neural Information Processing Systems (NeurIPS), 2011.
    Google ScholarLocate open access versionFindings
  • Robert Krauthgamer and James R. Lee. Navigating nets: Simple algorithms for proximity search. In Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 2004.
    Google ScholarLocate open access versionFindings
  • John Langford and Tong Zhang. The epoch-greedy algorithm for multi-armed bandits with side information. In Advances in Neural Information Processing Systems (NeurIPS). 2008.
    Google ScholarLocate open access versionFindings
  • Larry J. LeBlanc, Edward K. Morlok, and William P. Pierskalla. An efficient approach to solving the road network equilibrium traffic assignment problem. In Transportation Research Vol. 9, 1975.
    Google ScholarLocate open access versionFindings
  • Nick Littlestone and Manfred K. Warmuth. The weighted majority algorithm. Information and Computation, 1994.
    Google ScholarLocate open access versionFindings
  • Jaouad Mourtada and Stéphane Gaïffas. On the optimality of the hedge algorithm in the stochastic regime. Journal of Machine Learning Research, 2019.
    Google ScholarLocate open access versionFindings
  • Gergely Neu and Julia Olkhovskaya. Efficient and robust algorithms for adversarial linear contextual bandits. arXiv preprint arXiv:2002.00287, 2020.
    Findings
  • Carl Edward Rasmussen and Christopher KI Williams. Gaussian processes for machine learning, volume 1. MIT press Cambridge, 2006.
    Google ScholarFindings
  • Tim Roughgarden. Routing Games. Cambridge University Press, 2007.
    Google ScholarFindings
  • Tim Roughgarden. Intrinsic robustness of the price of anarchy. Journal of the ACM, 2015.
    Google ScholarLocate open access versionFindings
  • Pier Giuseppe Sessa, Ilija Bogunovic, Maryam Kamgarpour, and Andreas Krause. No-regret learning in unknown games with correlated payoffs. In Advances in Neural Information Processing Systems (NeurIPS), 2019.
    Google ScholarLocate open access versionFindings
  • Pier Giuseppe Sessa, Maryam Kamgarpour, and Andreas Krause. Bounding inefficiency of equilibria in continuous actions games using submodularity and curvature. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2019.
    Google ScholarLocate open access versionFindings
  • L. S. Shapley. Stochastic games. Proceedings of the National Academy of Sciences, 1953.
    Google ScholarLocate open access versionFindings
  • Satinder P. Singh, Michael J. Kearns, and Yishay Mansour. Nash convergence of gradient dynamics in general-sum games. In Conference on Uncertainty in Artificial Intelligence (UAI), 2000.
    Google ScholarLocate open access versionFindings
  • Aleksandrs Slivkins. Contextual bandits with similarity information. volume 19 of Proceedings of Machine Learning Research, pages 679–702, 2011.
    Google ScholarLocate open access versionFindings
  • Niranjan Srinivas, Andreas Krause, Sham M Kakade, and Matthias Seeger. Gaussian process optimization in the bandit setting: No regret and experimental design. In International Conference on Machine Learning (ICML), 2010.
    Google ScholarLocate open access versionFindings
  • Vasilis Syrgkanis, Alekh Agarwal, Haipeng Luo, and Robert E. Schapire. Fast convergence of regularized learning in games. In Advances in Neural Information Processing Systems (NeurIPS), 2015.
    Google ScholarLocate open access versionFindings
  • Vasilis Syrgkanis, Haipeng Luo, Akshay Krishnamurthy, and Robert E Schapire. Improved regret bounds for oracle-based adversarial contextual bandits. In Advances in Neural Information Processing Systems (NeurIPS). 2016.
    Google ScholarLocate open access versionFindings
  • Vasilis Syrgkanis and Eva Tardos. Composable and efficient mechanisms. In Annual ACM Symposium on Theory of Computing (STOC), 2013.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments