We show that if the row player knows an upper bound on the value of the game it can use a variant of MW to generate a sequence of mixed strategies that approach a strategy which
Adaptive Game Playing Using Multiplicative Weights
GAMES AND ECONOMIC BEHAVIOR, no. 1-2 (1999): 79-103
We present a simple algorithm for playing a repeated game. We show that a player using this algorithm suffers average loss that is guaranteed to come close to the minimum loss achievable by any fixed strategy. Our bounds are nonasymptotic and hold for any opponent. The algorithm, which uses the multiplicative-weight methods of Littlestone...更多
下载 PDF 全文
- The authors present a simple algorithm for playing a repeated game. The authors show that a player using this algorithm suffers average loss that is guaranteed to come close to the minimum loss achievable by any fixed strategy.
- The authors use ¤ to denote the probability ¤ £§¦ © £ that ¤ associates with the row , and the authors write ¦¤ §¥ © ̈¤ T¥ to denote the expected loss when the two mixed strategies are used.
- 76 8@9#A) The learning algorithm MW starts with some initial mixed strategy ¤ 1 which it uses for the first round of the game.
- We present a simple algorithm for playing a repeated game
- In the analysis presented so far we have shown that the average of the strategies used by MW converges to an optimal strategy
- We show that if the row player knows an upper bound on the value of the game it can use a variant of MW to generate a sequence of mixed strategies that approach a strategy which
- In Section 7, we show that this dependence on , and ¢ cannot be improved by any constant factor
- For the purposes of the proof, we imagine choosing the matrix at random according to an appropriate distribution, and we show that the stated properties hold with strictly positive probability, implying that there must exist at least one matrix for which they hold
- ¥ In other words, property 2 holds with probability at least ¡ . ¥ We show that property 1 fails to hold with probability strictly smaller than ¡ so that both properties must hold simultaneously with positive probability
- From Theorem 1 and Corollary 4 the authors know that the expected per-iteration loss of MW approaches the optimal achievable value for any fixed strategy as #
- Lemma 6 Let the players of a matrix game use any pair of methods for choosing their mixed strategies
- The goal of the row player is the same as before—to minimize its expected average loss over a sequence of repeated games.
- In Section 6.2 the authors show that if an upper bound on the value of the game is known ahead of time one can use a variant of MW that generates a sequence of row distributions such that the expected loss of the th distribution approaches .
- The authors show that if the row player knows an upper bound on the value of the game it can use a variant of MW to generate a sequence of mixed strategies that approach a strategy which
- £§¦ © £ 0 achieves loss .1 To do that the authors have the algorithm select a different value of for each round of the game.
- A single application of the exponential weights algorithm yields approximate solutions for both the column and row players.
- The solution for game matrix is related to the on-line prediction ' £ problem described in Section 4, while the “dual” solution for T corresponds to a method of learning called “boosting.”
- They may be most appropriate for the setting the authors have described of approximately solving a game when an oracle is available for choosing columns of the matrix on every round.
- The authors show that this dependence of the rate of convergence on , and ¢ is optimal in the sense that no adaptive game-playing algorithm can beat this bound even by a constant factor.
- For any adaptive game-playing algorithm , there exists a game matrix M of rows and a sequence of column strategies such that: 1.
- £§¦ © % is chosen at random, the authors need the row player has sole control a lower bound on over the choice of the ¤ , probability the authors need a that lower
- Shows that a player using this algorithm suffers average loss that is guaranteed to come close to the minimum loss achievable by any fixed strategy
- Presents a simple algorithm for solving this problem, and give a simple analysis of the algorithm
- Dana Angluin and Leslie G. Valiant. Fast probabilistic algorithms for Hamiltonian circuits and matchings. Journal of Computer and System Sciences, 18(2):155–193, April 1979.
- Peter Auer, Nicolo Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. Gambling in a rigged casino: The adversarial multi-armed bandit problem. In 36th Annual Symposium on Foundations of Computer Science, pages 322–331, 1995.
- David Blackwell. An analog of the minimax theorem for vector payoffs. Pacific Journal of Mathematics, 6(1):1–8, Spring 1956.
- David Blackwell and M.A. Girshick. Theory of games and statistical decisions. dover, 1954.
- Nicolo Cesa-Bianchi, Yoav Freund, David Haussler, David P. Helmbold, Robert E. Schapire, and Manfred K. Warmuth. How to use expert advice. Journal of the Association for Computing Machinery, 44(3):427–485, May 1997.
- T. M. Cover and E. Ordentlich. Universal portfolios with side information. IEEE Transactions on Information Theory, March 1996.
- Thomas M. Cover. Universal portfolios. Mathematical Finance, 1(1):1–29, January 1991.
- Thomas M. Cover and Joy A. Thomas. Elements of Information Theory. Wiley, 1991.
- A. P. Dawid. Statistical theory: The prequential approach. Journal of the Royal Statistical Society, Series A, 147:278–292, 1984.
- M. Feder, N. Merhav, and M. Gutman. Universal prediction of individual sequences. IEEE Transactions on Information Theory, 38:1258–1270, 1992.
- Thomas S. Ferguson. Mathematical Statistics: A Decision Theoretic Approach. Academic Press, 1967.
- Dean P. Foster. Prediction in the worst case. The Annals of Statistics, 19(2):1084–1090, 1991.
- Dean P. Foster and Rakesh Vohra. Regret in the on-line decision problem. unpublished manuscript, 1997.
- Dean P. Foster and Rakesh V. Vohra. A randomization rule for selecting forecasts. Operations Research, 41(4):704–709, July–August 1993.
- Dean P. Foster and Rakesh V. Vohra. Asymptotic calibration. Biometrika, 85(2):379–390, 1998.
- Yoav Freund and Robert E. Schapire. Game theory, on-line prediction and boosting. In Proceedings of the Ninth Annual Conference on Computational Learning Theory, pages 325–332, 1996.
- Drew Fudenberg and David K. Levine. Consistency and cautious fictitious play. Journal of Economic Dynamics and Control, 19:1065–1089, 1995.
- Michael D. Grigoriadis and Leonid G. Khachiyan. Approximate solution of matrix games in parallel. Technical Report 91-73, DIMACS, July 1991.
- Michael D. Grigoriadis and Leonid G. Khachiyan. A sublinear-time randomized approximation algorithm for matrix games. Operations Research Letters, 18(2):53–58, Sep 1995.
- James Hannan. Approximation to Bayes risk in repeated play. In M. Dresher, A. W. Tucker, and P. Wolfe, editors, Contributions to the Theory of Games, volume III, pages 97–139. Princeton University Press, 1957.
- David P. Helmbold, Robert E. Schapire, Yoram Singer, and Manfred K. Warmuth. On-line portfolio selection using multiplicative updates. Mathematical Finance, 8(4):325–347, 1998.
- Wassily Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301):13–30, March 1963.
- Jyrki Kivinen and Manfred K. Warmuth. Additive versus exponentiated gradient updates for linear prediction. Information and Computation, 132(1):1–64, January 1997.
- Philip Klein and Neal Young. On the number of iterations for Dantzig-Wolfe optimization and packing-covering approximation algorithms. In Proceedings of the Seventh Conference on Integer Programming and Combinatorial Optimization, 1999.
- Nick Littlestone and Manfred K. Warmuth. The weighted majority algorithm. Information and Computation, 108:212–261, 1994.
- Guillermo Owen. Game Theory. Academic Press, second edition, 1982.
- Serge A. Plotkin, David B. Shmoys, and Eva Tardos. Fast approximation algorithms for fractional packing and covering problems. Mathematics of Operations Research, 20(2):257–301, May 1995.
- Y. M. Shtar‘kov. Universal sequential coding of single messages. Problems of information Transmission (translated from Russian), 23:175–186, July-September 1987.
- V. G. Vovk. A game of prediction with expert advice. Journal of Computer and System Sciences, 56(2):153–173, April 1998.
- Volodimir G. Vovk. Aggregating strategies. In Proceedings of the Third Annual Workshop on Computational Learning Theory, pages 371–383, 1990.
- Neal Young. Randomized rounding without solving the linear program. In Proceedings of the Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 170–178, 1995.
- Jacob Ziv. Coding theorems for individual sequences. IEEE Transactions on Information Theory, 24(4):405–412, July 1978.