Smooth markets: A basic mechanism for organizing gradient-based learners

international conference on learning representations, 2020.

Cited by: 5|Views114
Weibo:
We show that Nash equilibria are stable; that if profits are strictly concave gradient ascent converges to a Nash equilibrium for all learning rates; and the dynamics are bounded under reasonable assumptions

Abstract:

With the success of modern machine learning, it is becoming increasingly important to understand and control how learning algorithms interact. Unfortunately, negative results from game theory show there is little hope of understanding or controlling general n-player games. We therefore introduce smooth markets (SM-games), a class of n-pla...More

Code:

Data:

0
Full Text
Bibtex
Weibo
Introduction
Highlights
  • As artificial agents proliferate, it is increasingly important to analyze, predict and control their collective behavior (Parkes and Wellman, 2015; Rahwan et al, 2019)
  • We investigate how markets structure the behavior of agents
  • Prior work has restricted to concrete examples, such as auctions and prediction markets, and strong assumptions, such as convexity
  • We show that (i) Nash equilibria are stable; that if profits are strictly concave gradient ascent converges to a Nash equilibrium for all learning rates; and the dynamics are bounded under reasonable assumptions
  • Example 1 shows that coupling concave functions can cause simultaneous gradient ascent to diverge to infinity
  • Machine learning has got a lot of mileage out of treating differentiable modules like plug-and-play lego blocks
Conclusion
  • Machine learning has got a lot of mileage out of treating differentiable modules like plug-and-play lego blocks.
  • The authors' main result is that SM-games are legible: changes in aggregate forecasts are the sum of how individual firms expect their forecasts to change
  • It follows that the authors can translate properties of the individual firms into guarantees on collective convergence, stability and boundedness in SM-games, see theorems 4-6.It is natural to expect individually well-behaved agents to behave well collectively.
  • Off-the-shelf optimizers such as Adam (Kingma and Ba, 2015) modify learning rates under the hood, which may destabilize some games
Summary
  • Introduction:

    It is increasingly important to analyze, predict and control their collective behavior (Parkes and Wellman, 2015; Rahwan et al, 2019).
  • Nash equilibria provide a general solution concept, but are intractable in almost all cases for many different reasons (Babichenko, 2016; Daskalakis et al, 2009; Hart and Mas-Colell, 2003)
  • These and other negative results (Palaiopanos et al, 2017) suggest that understanding and controlling societies of artificial agents is near hopeless.
  • For us, encompass discriminators and generators trading errors in GANs (Goodfellow et al, 2014) and agents trading wins and losses in StarCraft (Vinyals et al, 2019)
  • Objectives:

    A wide variety of machine learning markets and agent-based economies have been proposed and studied: Abernethy and Frongillo (2011); Balduzzi (2014); Barto et al (1983); Baum (1999); Hu and Storkey (2014); Kakade et al (2003; 2005); Kearns et al (2001); Kwee et al (2001); Lay and Barbu (2010); Minsky (1986); Selfridge (1958); Storkey (2011); Storkey et al (2012); Sutton et al (2011); Wellman and Wurman (1998).
  • Conclusion:

    Machine learning has got a lot of mileage out of treating differentiable modules like plug-and-play lego blocks.
  • The authors' main result is that SM-games are legible: changes in aggregate forecasts are the sum of how individual firms expect their forecasts to change
  • It follows that the authors can translate properties of the individual firms into guarantees on collective convergence, stability and boundedness in SM-games, see theorems 4-6.It is natural to expect individually well-behaved agents to behave well collectively.
  • Off-the-shelf optimizers such as Adam (Kingma and Ba, 2015) modify learning rates under the hood, which may destabilize some games
Related work
Funding
  • Shows that SM-games are amenable to analysis and optimization using first-order methods
  • Presents some pathologies that arise in even the simplest smooth games
  • Shows that Nash equilibria are stable; that if profits are strictly concave gradient ascent converges to a Nash equilibrium for all learning rates; and the dynamics are bounded under reasonable assumptions
Reference
  • Abernethy, J. and Frongillo, R. (2011). A Collaborative Mechanism for Crowdsourcing Prediction Problems. In NeurIPS.
    Google ScholarFindings
  • Abernethy, J., Lai, K. A., and Wibisono, A. (2019). Last-iterate convergence rates for min-max optimization. In arXiv:1906.02027.
    Findings
  • Babichenko, Y. (2016). Query Complexity of Approximate Nash Equilibria. Journal ACM, 63(4).
    Google ScholarLocate open access versionFindings
  • Bailey, J. P., Gidel, G., and Piliouras, G. (2019). Finite Regret and Cycles with Fixed Step-Size via Alternating Gradient Descent-Ascent. In arXiv:1907.04392.
    Findings
  • Bailey, J. P. and Piliouras, G. (2018). Multiplicative Weights Update in Zero-Sum Games. In ACM EC.
    Google ScholarLocate open access versionFindings
  • Balduzzi, D. (2014). Cortical prediction markets. In AAMAS.
    Google ScholarFindings
  • Balduzzi, D., Racaniere, S., Martens, J., Foerster, J., Tuyls, K., and Graepel, T. (2018). The mechanics of n-player differentiable games. In ICML.
    Google ScholarFindings
  • Barto, A. G., Sutton, R. S., and Anderson, C. W. (1983). Neuronlike Adapative Elements That Can Solve Difficult Learning Control Problems. IEEE Trans. Systems, Man, Cyb, 13(5):834–846.
    Google ScholarLocate open access versionFindings
  • Baum, E. B. (1999). Toward a Model of Intelligence as an Economy of Agents. Machine Learning, 35(155-185).
    Google ScholarLocate open access versionFindings
  • Berard, H., Gidel, G., Almahairi, A., Vincent, P., and Lacoste-Julien, S. (2019). A Closer Look at the Optimization Landscapes of Generative Adversarial Networks. In arXiv:1906.04848.
    Findings
  • Cai, Y., Candogan, O., Daskalakis, C., and Papadimitriou, C. (2016). Zero-sum Polymatrix Games: A Generalization of Minmax. Mathematics of Operations Research, 41(2):648–655.
    Google ScholarLocate open access versionFindings
  • Daskalakis, C., Goldberg, P. W., and Papadimitriou, C. (2009). The Complexity of Computing a Nash Equilibrium. SIAM J. Computing, 39(1):195–259.
    Google ScholarLocate open access versionFindings
  • Drexler, K. E. (2019). Reframing Superintelligence: Comprehensive AI Services as General Intelligence. Future of Humanity Institute, University of Oxford, Technical Report #2019-1.
    Google ScholarFindings
  • Gemp, I. and Mahadevan, S. (2017). Online Monotone Games. In arXiv:1710.07328.
    Findings
  • Gemp, I. and Mahadevan, S. (2018). Global Convergence to the Equilibrium of GANs using Variational Inequalities. In arXiv:1808.01531.
    Findings
  • Gidel, G., Hemmat, R. A., Pezeshki, M., Lepriol, R., Huang, G., Lacoste-Julien, S., and Mitliagkas, I. (2019). Negative Momentum for Improved Game Dynamics. In AISTATS.
    Google ScholarFindings
  • Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014). Generative Adversarial Nets. In NeurIPS.
    Google ScholarFindings
  • Hart, S. and Mas-Colell, A. (2003). Uncoupled Dynamics Do Not Lead to Nash Equilibrium. American Economic Review, 93(5):1830–1836.
    Google ScholarLocate open access versionFindings
  • Hu, J. and Storkey, A. (2014). Multi-period Trading Prediction Markets with Connections to Machine Learning. In ICML.
    Google ScholarLocate open access versionFindings
  • Kakade, S., Kearns, M., and Ortiz, L. (2003). Graphical economics. In COLT.
    Google ScholarFindings
  • Kakade, S., Kearns, M., Ortiz, L., Pemantle, R., and Suri, S. (2005). Economic properties of social networks. In NeurIPS.
    Google ScholarFindings
  • Kearns, M., Littman, M., and Singh, S. (2001). Graphical models for game theory. In UAI.
    Google ScholarFindings
  • Kingma, D. P. and Ba, J. L. (2015). Adam: A method for stochastic optimization. In ICLR.
    Google ScholarFindings
  • Kurakin, A., Goodfellow, I., and Bengio, S. (2017). Adversarial Machine Learning at Scale. In ICLR.
    Google ScholarFindings
  • Kwee, I., Hutter, M., and Schmidhuber, J. (2001). Market-based reinforcement learning in partially observable worlds. In ICANN.
    Google ScholarFindings
  • Lay, N. and Barbu, A. (2010). Supervised aggregation of classifiers using artificial prediction markets. In ICML.
    Google ScholarFindings
  • Letcher, A., Balduzzi, D., Racaniere, S., Martens, J., Foerster, J., Tuyls, K., and Graepel, T. (2019). Differentiable Game Mechanics. JMLR, 20:1–40.
    Google ScholarLocate open access versionFindings
  • Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and Vladu, A. (2018). Towards Deep Learning Models Resistant to Adversarial Attacks. In ICLR.
    Google ScholarFindings
  • Mescheder, L. (2018). On the convergence properties of GAN training. In ArXiv:1801:04406.
    Google ScholarLocate open access versionFindings
  • Mescheder, L., Nowozin, S., and Geiger, A. (2017). The Numerics of GANs. In NeurIPS.
    Google ScholarFindings
  • Minsky, M. (1986). The society of mind. Simon and Schuster, New York NY.
    Google ScholarFindings
  • Monderer, D. and Shapley, L. S. (1996). Potential Games. Games and Economic Behavior, 14:124– 143.
    Google ScholarLocate open access versionFindings
  • Nemirovski, A., Onn, S., and Rothblum, U. G. (2010). Accuracy certificates for computational problems with convex structure. Mathematics of Operations Research, 35(1).
    Google ScholarLocate open access versionFindings
  • Nisan, N., Roughgarden, T., Tardos, E., and Vazirani, V., editors (2007). Algorithmic Game Theory. Cambridge University Press, Cambridge.
    Google ScholarFindings
  • Palaiopanos, G., Panageas, I., and Piliouras, G. (2017). Multiplicative Weights Update with Constant Step-Size in Congestion Games: Convergence, Limit Cycles and Chaos. In NeurIPS.
    Google ScholarFindings
  • Parkes, D. C. and Wellman, M. P. (2015). Economic reasoning and artificial intelligence. Science, 349(6245):267–272.
    Google ScholarLocate open access versionFindings
  • Pathak, D., Agrawal, P., Efros, A. A., and Darrell, T. (2017). Curiosity-driven Exploration by Self-supervised Prediction. In ICML.
    Google ScholarFindings
  • Rahwan, I., Cebrian, M., Obradovich, N., Bongard, J., Bonnefon, J.-F., Breazeal, C., Crandall, J. W., Christakis, N. A., Couzin, I. D., Jackson, M. O., Jennings, N. R., Kamar, E., Kloumann, I. M., Larochelle, H., Lazer, D., Mcelreath, R., Mislove, A., Parkes, D. C., Pentland, A. S., Roberts, M. E., Shariff, A., Tenenbaum, J. B., and Wellman, M. (2019). Machine behaviour. Nature, 568:477–486.
    Google ScholarLocate open access versionFindings
  • Scott, J. (1999). Seeing Like a State: How Certain Schemes to Improve the Human Condition Have Failed. Yale University Press.
    Google ScholarFindings
  • Selfridge, O. G. (1958). Pandemonium: a paradigm for learning. In Mechanisation of Thought Processes: Proc Symposium Held at the National Physics Laboratory.
    Google ScholarLocate open access versionFindings
  • Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T. P., Leach, M., Kavukcuoglu, K., Graepel, T., and Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587):484–489.
    Google ScholarLocate open access versionFindings
  • Smith, A. (1776). The Wealth of Nations. W. Strahan and T. Cadell, London.
    Google ScholarFindings
  • Storkey, A. (2011). Machine Learning Markets. In AISTATS.
    Google ScholarFindings
  • Storkey, A., Millin, J., and Geras, K. (2012). Isoelastic Agents and Wealth Udates in Machine Learning Markets. In ICML.
    Google ScholarLocate open access versionFindings
  • Sutton, R., Modayil, J., Delp, M., Degris, T., Pilarski, P. M., White, A., and Precup, D. (2011). Horde: A Scalable Real-time Architecture for Learning Knowledge from Unsupervised Motor Interaction. In AAMAS.
    Google ScholarFindings
  • Tatarenko, T. and Kamgarpour, M. (2019). Learning Generalized Nash Equilibria in a Class of Convex Games. IEEE Transactions on Automatic Control, 64(4):1426–1439.
    Google ScholarLocate open access versionFindings
  • Vickrey, W. (1961). Counterspeculation, Auctions and Competitive Sealed Tenders. J Finance, 16:8–37.
    Google ScholarLocate open access versionFindings
  • Vinyals, O., Babuschkin, I., Chung, J., Mathieu, M., Jaderberg, M., Czarnecki, W. M., Dudzik, A., Huang, A., Georgiev, P., Powell, R., Ewalds, T., Horgan, D., Kroiss, M., Danihelka, I., Agapiou, J., Oh, J., Dalibard, V., Choi, D., Sifre, L., Sulsky, Y., Vezhnevets, S., Molloy, J., Cai, T., Budden, D., Paine, T., Gulcehre, C., Wang, Z., Pfaff, T., Pohlen, T., Wu, Y., Yogatama, D., Cohen, J., McKinney, K., Smith, O., Schaul, T., Lillicrap, T., Apps, C., Kavukcuoglu, K., Hassabis, D., and Silver, D. (2019). AlphaStar: Mastering the Real-Time Strategy Game StarCraft II. https://deepmind.com/blog/alphastar-mastering -real-time-strategy-game-starcraft-ii/.
    Findings
  • von Neumann, J. (1928). Zur Theorie der Gesellschaftsspiele. Mathematische Annalen, 100(1):295– 320.
    Google ScholarLocate open access versionFindings
  • von Neumann, J. and Morgenstern, O. (1944). Theory of Games and Economic Behavior. Princeton University Press, Princeton NJ.
    Google ScholarFindings
  • Wellman, M. P. and Wurman, P. R. (1998). Market-aware agents for a multiagent world. Robotics and Autonomous Systems, 24:115–125.
    Google ScholarLocate open access versionFindings
  • Wu, Y., Donahue, J., Balduzzi, D., Simonyan, K., and Lillicrap, T. (2019). LOGAN: Latent Optimisation for Generative Adversarial Networks. In arXiv:1912.00953.
    Findings
  • Zhu, J.-Y., Park, T., Isola, P., and Efros, A. A. (2017). Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. In CVPR.
    Google ScholarFindings
Your rating :
0

 

Tags
Comments