# Smooth markets: A basic mechanism for organizing gradient-based learners

international conference on learning representations, 2020.

Weibo:

Abstract:

With the success of modern machine learning, it is becoming increasingly important to understand and control how learning algorithms interact. Unfortunately, negative results from game theory show there is little hope of understanding or controlling general n-player games. We therefore introduce smooth markets (SM-games), a class of n-pla...More

Introduction

- It is increasingly important to analyze, predict and control their collective behavior (Parkes and Wellman, 2015; Rahwan et al, 2019).
- Nash equilibria provide a general solution concept, but are intractable in almost all cases for many different reasons (Babichenko, 2016; Daskalakis et al, 2009; Hart and Mas-Colell, 2003)
- These and other negative results (Palaiopanos et al, 2017) suggest that understanding and controlling societies of artificial agents is near hopeless.
- For us, encompass discriminators and generators trading errors in GANs (Goodfellow et al, 2014) and agents trading wins and losses in StarCraft (Vinyals et al, 2019)

Highlights

- As artificial agents proliferate, it is increasingly important to analyze, predict and control their collective behavior (Parkes and Wellman, 2015; Rahwan et al, 2019)
- We investigate how markets structure the behavior of agents
- Prior work has restricted to concrete examples, such as auctions and prediction markets, and strong assumptions, such as convexity
- We show that (i) Nash equilibria are stable; that if profits are strictly concave gradient ascent converges to a Nash equilibrium for all learning rates; and the dynamics are bounded under reasonable assumptions
- Example 1 shows that coupling concave functions can cause simultaneous gradient ascent to diverge to infinity
- Machine learning has got a lot of mileage out of treating differentiable modules like plug-and-play lego blocks

Conclusion

- Machine learning has got a lot of mileage out of treating differentiable modules like plug-and-play lego blocks.
- The authors' main result is that SM-games are legible: changes in aggregate forecasts are the sum of how individual firms expect their forecasts to change
- It follows that the authors can translate properties of the individual firms into guarantees on collective convergence, stability and boundedness in SM-games, see theorems 4-6.It is natural to expect individually well-behaved agents to behave well collectively.
- Off-the-shelf optimizers such as Adam (Kingma and Ba, 2015) modify learning rates under the hood, which may destabilize some games

Summary

## Introduction:

It is increasingly important to analyze, predict and control their collective behavior (Parkes and Wellman, 2015; Rahwan et al, 2019).- Nash equilibria provide a general solution concept, but are intractable in almost all cases for many different reasons (Babichenko, 2016; Daskalakis et al, 2009; Hart and Mas-Colell, 2003)
- These and other negative results (Palaiopanos et al, 2017) suggest that understanding and controlling societies of artificial agents is near hopeless.
- For us, encompass discriminators and generators trading errors in GANs (Goodfellow et al, 2014) and agents trading wins and losses in StarCraft (Vinyals et al, 2019)
## Objectives:

A wide variety of machine learning markets and agent-based economies have been proposed and studied: Abernethy and Frongillo (2011); Balduzzi (2014); Barto et al (1983); Baum (1999); Hu and Storkey (2014); Kakade et al (2003; 2005); Kearns et al (2001); Kwee et al (2001); Lay and Barbu (2010); Minsky (1986); Selfridge (1958); Storkey (2011); Storkey et al (2012); Sutton et al (2011); Wellman and Wurman (1998).## Conclusion:

Machine learning has got a lot of mileage out of treating differentiable modules like plug-and-play lego blocks.- The authors' main result is that SM-games are legible: changes in aggregate forecasts are the sum of how individual firms expect their forecasts to change
- It follows that the authors can translate properties of the individual firms into guarantees on collective convergence, stability and boundedness in SM-games, see theorems 4-6.It is natural to expect individually well-behaved agents to behave well collectively.
- Off-the-shelf optimizers such as Adam (Kingma and Ba, 2015) modify learning rates under the hood, which may destabilize some games

Related work

- A wide variety of machine learning markets and agent-based economies have been proposed and studied: Abernethy and Frongillo (2011); Balduzzi (2014); Barto et al (1983); Baum (1999); Hu and Storkey (2014); Kakade et al (2003; 2005); Kearns et al (2001); Kwee et al (2001); Lay and Barbu (2010); Minsky (1986); Selfridge (1958); Storkey (2011); Storkey et al (2012); Sutton et al (2011); Wellman and Wurman (1998). The goal of this paper is different. Rather than propose another market mechanism, we abstract an existing design pattern and elucidate some of its consequences for interacting agents.

Our approach draws on work studying convergence in generative adversarial networks (Balduzzi et al, 2018; Gemp and Mahadevan, 2018; Gidel et al, 2019; Mescheder, 2018; Mescheder et al, 2017), related minimax problems (Abernethy et al, 2019; Bailey and Piliouras, 2018), and monotone games (Gemp and Mahadevan, 2017; Nemirovski et al, 2010; Tatarenko and Kamgarpour, 2019). 1.3 CAVEAT

We consider dynamics in continuous time dw dt =

ξ(w) in this paper.

Discrete dynamics, wt+1 ←

wt + ξ(w) require a more delicate analysis, e.g. Bailey et al (2019). In particular, we do not claim that optimizing GANs and SM-games is easy in discrete time. Rather, our analyis shows that it is relatively easy in continuous time, and therefore possible in discrete time, with some additional effort.

Funding

- Shows that SM-games are amenable to analysis and optimization using first-order methods
- Presents some pathologies that arise in even the simplest smooth games
- Shows that Nash equilibria are stable; that if profits are strictly concave gradient ascent converges to a Nash equilibrium for all learning rates; and the dynamics are bounded under reasonable assumptions

Reference

- Abernethy, J. and Frongillo, R. (2011). A Collaborative Mechanism for Crowdsourcing Prediction Problems. In NeurIPS.
- Abernethy, J., Lai, K. A., and Wibisono, A. (2019). Last-iterate convergence rates for min-max optimization. In arXiv:1906.02027.
- Babichenko, Y. (2016). Query Complexity of Approximate Nash Equilibria. Journal ACM, 63(4).
- Bailey, J. P., Gidel, G., and Piliouras, G. (2019). Finite Regret and Cycles with Fixed Step-Size via Alternating Gradient Descent-Ascent. In arXiv:1907.04392.
- Bailey, J. P. and Piliouras, G. (2018). Multiplicative Weights Update in Zero-Sum Games. In ACM EC.
- Balduzzi, D. (2014). Cortical prediction markets. In AAMAS.
- Balduzzi, D., Racaniere, S., Martens, J., Foerster, J., Tuyls, K., and Graepel, T. (2018). The mechanics of n-player differentiable games. In ICML.
- Barto, A. G., Sutton, R. S., and Anderson, C. W. (1983). Neuronlike Adapative Elements That Can Solve Difficult Learning Control Problems. IEEE Trans. Systems, Man, Cyb, 13(5):834–846.
- Baum, E. B. (1999). Toward a Model of Intelligence as an Economy of Agents. Machine Learning, 35(155-185).
- Berard, H., Gidel, G., Almahairi, A., Vincent, P., and Lacoste-Julien, S. (2019). A Closer Look at the Optimization Landscapes of Generative Adversarial Networks. In arXiv:1906.04848.
- Cai, Y., Candogan, O., Daskalakis, C., and Papadimitriou, C. (2016). Zero-sum Polymatrix Games: A Generalization of Minmax. Mathematics of Operations Research, 41(2):648–655.
- Daskalakis, C., Goldberg, P. W., and Papadimitriou, C. (2009). The Complexity of Computing a Nash Equilibrium. SIAM J. Computing, 39(1):195–259.
- Drexler, K. E. (2019). Reframing Superintelligence: Comprehensive AI Services as General Intelligence. Future of Humanity Institute, University of Oxford, Technical Report #2019-1.
- Gemp, I. and Mahadevan, S. (2017). Online Monotone Games. In arXiv:1710.07328.
- Gemp, I. and Mahadevan, S. (2018). Global Convergence to the Equilibrium of GANs using Variational Inequalities. In arXiv:1808.01531.
- Gidel, G., Hemmat, R. A., Pezeshki, M., Lepriol, R., Huang, G., Lacoste-Julien, S., and Mitliagkas, I. (2019). Negative Momentum for Improved Game Dynamics. In AISTATS.
- Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014). Generative Adversarial Nets. In NeurIPS.
- Hart, S. and Mas-Colell, A. (2003). Uncoupled Dynamics Do Not Lead to Nash Equilibrium. American Economic Review, 93(5):1830–1836.
- Hu, J. and Storkey, A. (2014). Multi-period Trading Prediction Markets with Connections to Machine Learning. In ICML.
- Kakade, S., Kearns, M., and Ortiz, L. (2003). Graphical economics. In COLT.
- Kakade, S., Kearns, M., Ortiz, L., Pemantle, R., and Suri, S. (2005). Economic properties of social networks. In NeurIPS.
- Kearns, M., Littman, M., and Singh, S. (2001). Graphical models for game theory. In UAI.
- Kingma, D. P. and Ba, J. L. (2015). Adam: A method for stochastic optimization. In ICLR.
- Kurakin, A., Goodfellow, I., and Bengio, S. (2017). Adversarial Machine Learning at Scale. In ICLR.
- Kwee, I., Hutter, M., and Schmidhuber, J. (2001). Market-based reinforcement learning in partially observable worlds. In ICANN.
- Lay, N. and Barbu, A. (2010). Supervised aggregation of classifiers using artificial prediction markets. In ICML.
- Letcher, A., Balduzzi, D., Racaniere, S., Martens, J., Foerster, J., Tuyls, K., and Graepel, T. (2019). Differentiable Game Mechanics. JMLR, 20:1–40.
- Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and Vladu, A. (2018). Towards Deep Learning Models Resistant to Adversarial Attacks. In ICLR.
- Mescheder, L. (2018). On the convergence properties of GAN training. In ArXiv:1801:04406.
- Mescheder, L., Nowozin, S., and Geiger, A. (2017). The Numerics of GANs. In NeurIPS.
- Minsky, M. (1986). The society of mind. Simon and Schuster, New York NY.
- Monderer, D. and Shapley, L. S. (1996). Potential Games. Games and Economic Behavior, 14:124– 143.
- Nemirovski, A., Onn, S., and Rothblum, U. G. (2010). Accuracy certificates for computational problems with convex structure. Mathematics of Operations Research, 35(1).
- Nisan, N., Roughgarden, T., Tardos, E., and Vazirani, V., editors (2007). Algorithmic Game Theory. Cambridge University Press, Cambridge.
- Palaiopanos, G., Panageas, I., and Piliouras, G. (2017). Multiplicative Weights Update with Constant Step-Size in Congestion Games: Convergence, Limit Cycles and Chaos. In NeurIPS.
- Parkes, D. C. and Wellman, M. P. (2015). Economic reasoning and artificial intelligence. Science, 349(6245):267–272.
- Pathak, D., Agrawal, P., Efros, A. A., and Darrell, T. (2017). Curiosity-driven Exploration by Self-supervised Prediction. In ICML.
- Rahwan, I., Cebrian, M., Obradovich, N., Bongard, J., Bonnefon, J.-F., Breazeal, C., Crandall, J. W., Christakis, N. A., Couzin, I. D., Jackson, M. O., Jennings, N. R., Kamar, E., Kloumann, I. M., Larochelle, H., Lazer, D., Mcelreath, R., Mislove, A., Parkes, D. C., Pentland, A. S., Roberts, M. E., Shariff, A., Tenenbaum, J. B., and Wellman, M. (2019). Machine behaviour. Nature, 568:477–486.
- Scott, J. (1999). Seeing Like a State: How Certain Schemes to Improve the Human Condition Have Failed. Yale University Press.
- Selfridge, O. G. (1958). Pandemonium: a paradigm for learning. In Mechanisation of Thought Processes: Proc Symposium Held at the National Physics Laboratory.
- Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T. P., Leach, M., Kavukcuoglu, K., Graepel, T., and Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587):484–489.
- Smith, A. (1776). The Wealth of Nations. W. Strahan and T. Cadell, London.
- Storkey, A. (2011). Machine Learning Markets. In AISTATS.
- Storkey, A., Millin, J., and Geras, K. (2012). Isoelastic Agents and Wealth Udates in Machine Learning Markets. In ICML.
- Sutton, R., Modayil, J., Delp, M., Degris, T., Pilarski, P. M., White, A., and Precup, D. (2011). Horde: A Scalable Real-time Architecture for Learning Knowledge from Unsupervised Motor Interaction. In AAMAS.
- Tatarenko, T. and Kamgarpour, M. (2019). Learning Generalized Nash Equilibria in a Class of Convex Games. IEEE Transactions on Automatic Control, 64(4):1426–1439.
- Vickrey, W. (1961). Counterspeculation, Auctions and Competitive Sealed Tenders. J Finance, 16:8–37.
- Vinyals, O., Babuschkin, I., Chung, J., Mathieu, M., Jaderberg, M., Czarnecki, W. M., Dudzik, A., Huang, A., Georgiev, P., Powell, R., Ewalds, T., Horgan, D., Kroiss, M., Danihelka, I., Agapiou, J., Oh, J., Dalibard, V., Choi, D., Sifre, L., Sulsky, Y., Vezhnevets, S., Molloy, J., Cai, T., Budden, D., Paine, T., Gulcehre, C., Wang, Z., Pfaff, T., Pohlen, T., Wu, Y., Yogatama, D., Cohen, J., McKinney, K., Smith, O., Schaul, T., Lillicrap, T., Apps, C., Kavukcuoglu, K., Hassabis, D., and Silver, D. (2019). AlphaStar: Mastering the Real-Time Strategy Game StarCraft II. https://deepmind.com/blog/alphastar-mastering -real-time-strategy-game-starcraft-ii/.
- von Neumann, J. (1928). Zur Theorie der Gesellschaftsspiele. Mathematische Annalen, 100(1):295– 320.
- von Neumann, J. and Morgenstern, O. (1944). Theory of Games and Economic Behavior. Princeton University Press, Princeton NJ.
- Wellman, M. P. and Wurman, P. R. (1998). Market-aware agents for a multiagent world. Robotics and Autonomous Systems, 24:115–125.
- Wu, Y., Donahue, J., Balduzzi, D., Simonyan, K., and Lillicrap, T. (2019). LOGAN: Latent Optimisation for Generative Adversarial Networks. In arXiv:1912.00953.
- Zhu, J.-Y., Park, T., Isola, P., and Efros, A. A. (2017). Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. In CVPR.

Tags

Comments