AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
We show that if the row player knows an upper bound on the value of the game it can use a variant of MW to generate a sequence of mixed strategies that approach a strategy which

Adaptive Game Playing Using Multiplicative Weights

GAMES AND ECONOMIC BEHAVIOR, no. 1-2 (1999): 79-103

引用422|浏览42
EI
下载 PDF 全文
引用
微博一下
关键词

摘要

We present a simple algorithm for playing a repeated game. We show that a player using this algorithm suffers average loss that is guaranteed to come close to the minimum loss achievable by any fixed strategy. Our bounds are nonasymptotic and hold for any opponent. The algorithm, which uses the multiplicative-weight methods of Littlestone...更多

代码

数据

简介
  • The authors present a simple algorithm for playing a repeated game. The authors show that a player using this algorithm suffers average loss that is guaranteed to come close to the minimum loss achievable by any fixed strategy.
  • The authors use ¤ to denote the probability ¤ £§¦ © £ that ¤ associates with the row , and the authors write ¦¤ §¥ © ̈¤ T¥ to denote the expected loss when the two mixed strategies are used.
  • 76 8@9#A) The learning algorithm MW starts with some initial mixed strategy ¤ 1 which it uses for the first round of the game.
重点内容
  • We present a simple algorithm for playing a repeated game
  • In the analysis presented so far we have shown that the average of the strategies used by MW converges to an optimal strategy
  • We show that if the row player knows an upper bound on the value of the game it can use a variant of MW to generate a sequence of mixed strategies that approach a strategy which
  • In Section 7, we show that this dependence on , and ¢ cannot be improved by any constant factor
  • For the purposes of the proof, we imagine choosing the matrix at random according to an appropriate distribution, and we show that the stated properties hold with strictly positive probability, implying that there must exist at least one matrix for which they hold
  • ¥ In other words, property 2 holds with probability at least ¡ . ¥ We show that property 1 fails to hold with probability strictly smaller than ¡ so that both properties must hold simultaneously with positive probability
结果
  • From Theorem 1 and Corollary 4 the authors know that the expected per-iteration loss of MW approaches the optimal achievable value for any fixed strategy as #
  • Lemma 6 Let the players of a matrix game use any pair of methods for choosing their mixed strategies
  • The goal of the row player is the same as before—to minimize its expected average loss over a sequence of repeated games.
  • In Section 6.2 the authors show that if an upper bound on the value of the game is known ahead of time one can use a variant of MW that generates a sequence of row distributions such that the expected loss of the th distribution approaches .
  • The authors show that if the row player knows an upper bound on the value of the game it can use a variant of MW to generate a sequence of mixed strategies that approach a strategy which
  • £§¦ © £ 0 achieves loss .1 To do that the authors have the algorithm select a different value of for each round of the game.
  • A single application of the exponential weights algorithm yields approximate solutions for both the column and row players.
  • The solution for game matrix is related to the on-line prediction ' £ problem described in Section 4, while the “dual” solution for T corresponds to a method of learning called “boosting.”
  • They may be most appropriate for the setting the authors have described of approximately solving a game when an oracle is available for choosing columns of the matrix on every round.
结论
  • The authors show that this dependence of the rate of convergence on , and ¢ is optimal in the sense that no adaptive game-playing algorithm can beat this bound even by a constant factor.
  • For any adaptive game-playing algorithm , there exists a game matrix M of rows and a sequence of column strategies such that: 1.
  • £§¦ © % is chosen at random, the authors need the row player has sole control a lower bound on over the choice of the ¤ , probability the authors need a that lower
基金
  • Shows that a player using this algorithm suffers average loss that is guaranteed to come close to the minimum loss achievable by any fixed strategy
  • Presents a simple algorithm for solving this problem, and give a simple analysis of the algorithm
引用论文
  • Dana Angluin and Leslie G. Valiant. Fast probabilistic algorithms for Hamiltonian circuits and matchings. Journal of Computer and System Sciences, 18(2):155–193, April 1979.
    Google ScholarLocate open access versionFindings
  • Peter Auer, Nicolo Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. Gambling in a rigged casino: The adversarial multi-armed bandit problem. In 36th Annual Symposium on Foundations of Computer Science, pages 322–331, 1995.
    Google ScholarLocate open access versionFindings
  • David Blackwell. An analog of the minimax theorem for vector payoffs. Pacific Journal of Mathematics, 6(1):1–8, Spring 1956.
    Google ScholarLocate open access versionFindings
  • David Blackwell and M.A. Girshick. Theory of games and statistical decisions. dover, 1954.
    Google ScholarFindings
  • Nicolo Cesa-Bianchi, Yoav Freund, David Haussler, David P. Helmbold, Robert E. Schapire, and Manfred K. Warmuth. How to use expert advice. Journal of the Association for Computing Machinery, 44(3):427–485, May 1997.
    Google ScholarLocate open access versionFindings
  • T. M. Cover and E. Ordentlich. Universal portfolios with side information. IEEE Transactions on Information Theory, March 1996.
    Google ScholarLocate open access versionFindings
  • Thomas M. Cover. Universal portfolios. Mathematical Finance, 1(1):1–29, January 1991.
    Google ScholarLocate open access versionFindings
  • Thomas M. Cover and Joy A. Thomas. Elements of Information Theory. Wiley, 1991.
    Google ScholarFindings
  • A. P. Dawid. Statistical theory: The prequential approach. Journal of the Royal Statistical Society, Series A, 147:278–292, 1984.
    Google ScholarLocate open access versionFindings
  • M. Feder, N. Merhav, and M. Gutman. Universal prediction of individual sequences. IEEE Transactions on Information Theory, 38:1258–1270, 1992.
    Google ScholarLocate open access versionFindings
  • Thomas S. Ferguson. Mathematical Statistics: A Decision Theoretic Approach. Academic Press, 1967.
    Google ScholarFindings
  • Dean P. Foster. Prediction in the worst case. The Annals of Statistics, 19(2):1084–1090, 1991.
    Google ScholarLocate open access versionFindings
  • Dean P. Foster and Rakesh Vohra. Regret in the on-line decision problem. unpublished manuscript, 1997.
    Google ScholarFindings
  • Dean P. Foster and Rakesh V. Vohra. A randomization rule for selecting forecasts. Operations Research, 41(4):704–709, July–August 1993.
    Google ScholarLocate open access versionFindings
  • Dean P. Foster and Rakesh V. Vohra. Asymptotic calibration. Biometrika, 85(2):379–390, 1998.
    Google ScholarLocate open access versionFindings
  • Yoav Freund and Robert E. Schapire. Game theory, on-line prediction and boosting. In Proceedings of the Ninth Annual Conference on Computational Learning Theory, pages 325–332, 1996.
    Google ScholarLocate open access versionFindings
  • Drew Fudenberg and David K. Levine. Consistency and cautious fictitious play. Journal of Economic Dynamics and Control, 19:1065–1089, 1995.
    Google ScholarLocate open access versionFindings
  • Michael D. Grigoriadis and Leonid G. Khachiyan. Approximate solution of matrix games in parallel. Technical Report 91-73, DIMACS, July 1991.
    Google ScholarFindings
  • Michael D. Grigoriadis and Leonid G. Khachiyan. A sublinear-time randomized approximation algorithm for matrix games. Operations Research Letters, 18(2):53–58, Sep 1995.
    Google ScholarLocate open access versionFindings
  • James Hannan. Approximation to Bayes risk in repeated play. In M. Dresher, A. W. Tucker, and P. Wolfe, editors, Contributions to the Theory of Games, volume III, pages 97–139. Princeton University Press, 1957.
    Google ScholarLocate open access versionFindings
  • David P. Helmbold, Robert E. Schapire, Yoram Singer, and Manfred K. Warmuth. On-line portfolio selection using multiplicative updates. Mathematical Finance, 8(4):325–347, 1998.
    Google ScholarLocate open access versionFindings
  • Wassily Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301):13–30, March 1963.
    Google ScholarLocate open access versionFindings
  • Jyrki Kivinen and Manfred K. Warmuth. Additive versus exponentiated gradient updates for linear prediction. Information and Computation, 132(1):1–64, January 1997.
    Google ScholarLocate open access versionFindings
  • Philip Klein and Neal Young. On the number of iterations for Dantzig-Wolfe optimization and packing-covering approximation algorithms. In Proceedings of the Seventh Conference on Integer Programming and Combinatorial Optimization, 1999.
    Google ScholarLocate open access versionFindings
  • Nick Littlestone and Manfred K. Warmuth. The weighted majority algorithm. Information and Computation, 108:212–261, 1994.
    Google ScholarLocate open access versionFindings
  • Guillermo Owen. Game Theory. Academic Press, second edition, 1982.
    Google ScholarFindings
  • Serge A. Plotkin, David B. Shmoys, and Eva Tardos. Fast approximation algorithms for fractional packing and covering problems. Mathematics of Operations Research, 20(2):257–301, May 1995.
    Google ScholarLocate open access versionFindings
  • Y. M. Shtar‘kov. Universal sequential coding of single messages. Problems of information Transmission (translated from Russian), 23:175–186, July-September 1987.
    Google ScholarFindings
  • V. G. Vovk. A game of prediction with expert advice. Journal of Computer and System Sciences, 56(2):153–173, April 1998.
    Google ScholarLocate open access versionFindings
  • Volodimir G. Vovk. Aggregating strategies. In Proceedings of the Third Annual Workshop on Computational Learning Theory, pages 371–383, 1990.
    Google ScholarLocate open access versionFindings
  • Neal Young. Randomized rounding without solving the linear program. In Proceedings of the Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 170–178, 1995.
    Google ScholarLocate open access versionFindings
  • Jacob Ziv. Coding theorems for individual sequences. IEEE Transactions on Information Theory, 24(4):405–412, July 1978.
    Google ScholarLocate open access versionFindings
0
您的评分 :

暂无评分

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn