## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# Online Bayesian Persuasion

NIPS 2020, (2020)

EI

Keywords

Abstract

In Bayesian persuasion, an informed sender has to design a signaling scheme that discloses the right amount of information so as to influence the behavior of a self-interested receiver. This kind of strategic interaction is ubiquitous in real-world economic scenarios. However, the seminal model by Kamenica and Gentzkow makes some stringen...More

Code:

Data:

Introduction

- Bayesian persuasion was first introduced by Kamenica and Gentzkow [23] as the problem faced by an informed sender trying to influence the behavior of a self-interested receiver via the strategic provision of payoff-relevant information.
- Receiver’s best-response set After observing a signal s ∈ S that induces a posterior ξ ∈ Ξ, the receiver best responds by choosing an action that maximizes her/his expected utility (step (v)).

Highlights

- Bayesian persuasion was first introduced by Kamenica and Gentzkow [23] as the problem faced by an informed sender trying to influence the behavior of a self-interested receiver via the strategic provision of payoff-relevant information
- Our goal is the design of an online algorithm that recommends a signaling scheme at each round, guaranteeing an expected utility for the sender close to that of the best-in-hindsight signaling scheme. We study this problem under two models of feedback: in the full information model, the sender selects a signaling scheme and later observes the type of the best-responding receiver; in the partial information model, the sender only observes the actions taken by the receiver
- In order to prove the result, we provide an intermediate step, showing that the problem of approximating an optimal signaling scheme is computationally intractable even in the offline Bayesian persuasion problem in which the sender knows the probability distribution over the receiver’s types
- For an arbitrary sequence of receiver’s types, we show that there exists w ∈ W guaranteeing to the sender an expected utility that is equal to the best-in-hindsight signaling scheme
- We achieve the goal of keeping the bias and the range of the estimators small by adopting the following two technical caveats: (i) we focus on posteriors that can be induced by a signaling scheme with at least some (‘not too small’) probability, which ensures that the resulting estimators have a limited range; and (ii) we restrict the full-information algorithm to signaling schemes W ◦ ⊆ W inducing a small number of posteriors, which guarantees to have estimators with a small bias

Results

- The authors' first result is negative: for any α < 1, it is unlikely that there exists a no-α-regret algorithm for the online Bayesian persuasion problem requiring a per-round running time polynomial in the size of the instance.
- In order to prove the result, the authors provide an intermediate step, showing that the problem of approximating an optimal signaling scheme is computationally intractable even in the offline Bayesian persuasion problem in which the sender knows the probability distribution over the receiver’s types.
- This is not a trivial problem because, at every round t, the sender has to choose a signaling scheme among an infinite number of alternatives and her/his utility depends on the receiver’s best response, which yields a function that is not linear nor convex.
- The authors show that it is possible to provide a no-regret algorithm for the full information setting by restricting the sender’s action space to a finite set of posteriors.
- The authors show that it is always possible to design a sender-optimal signaling scheme defined as a convex combination of a specific finite set of posteriors.
- For an arbitrary sequence of receiver’s types, the authors show that there exists w ∈ W guaranteeing to the sender an expected utility that is equal to the best-in-hindsight signaling scheme.
- Given an online Bayesian persuasion problem with full information feedback, there exists an online algorithm such that, for every sequence of receiver’s types k = {kt}t∈[T ]: RT ≤ O
- During each block Iτ with τ ∈ [Z], Algorithm 1 alternates between two tasks: (i) exploration (Line 8), trying all the signaling schemes in a subset W ⊆ W given as input, so as to compute the required estimates of the sender’s expected utilities; and (ii) exploitation (Line 10), playing strategy qτ recommend by FULL-INFORMATION(·) for Iτ .

Conclusion

- Given an online Bayesian persuasion problem with partial feedback, there exist W ◦ ⊆ W , W ⊆ W , and estimators usIτ (w) such that Algorithm 1 provides the following regret bound: nm2/3d log1/3 T 1/5
- In order to prove this result, the authors show that Algorithm 1 provides a regret bound that depends on the number |W | of signaling schemes used for exploration, the logarithm of |W ◦|, and the range and bias of the estimators usIτ (w).

Summary

- Bayesian persuasion was first introduced by Kamenica and Gentzkow [23] as the problem faced by an informed sender trying to influence the behavior of a self-interested receiver via the strategic provision of payoff-relevant information.
- Receiver’s best-response set After observing a signal s ∈ S that induces a posterior ξ ∈ Ξ, the receiver best responds by choosing an action that maximizes her/his expected utility (step (v)).
- The authors' first result is negative: for any α < 1, it is unlikely that there exists a no-α-regret algorithm for the online Bayesian persuasion problem requiring a per-round running time polynomial in the size of the instance.
- In order to prove the result, the authors provide an intermediate step, showing that the problem of approximating an optimal signaling scheme is computationally intractable even in the offline Bayesian persuasion problem in which the sender knows the probability distribution over the receiver’s types.
- This is not a trivial problem because, at every round t, the sender has to choose a signaling scheme among an infinite number of alternatives and her/his utility depends on the receiver’s best response, which yields a function that is not linear nor convex.
- The authors show that it is possible to provide a no-regret algorithm for the full information setting by restricting the sender’s action space to a finite set of posteriors.
- The authors show that it is always possible to design a sender-optimal signaling scheme defined as a convex combination of a specific finite set of posteriors.
- For an arbitrary sequence of receiver’s types, the authors show that there exists w ∈ W guaranteeing to the sender an expected utility that is equal to the best-in-hindsight signaling scheme.
- Given an online Bayesian persuasion problem with full information feedback, there exists an online algorithm such that, for every sequence of receiver’s types k = {kt}t∈[T ]: RT ≤ O
- During each block Iτ with τ ∈ [Z], Algorithm 1 alternates between two tasks: (i) exploration (Line 8), trying all the signaling schemes in a subset W ⊆ W given as input, so as to compute the required estimates of the sender’s expected utilities; and (ii) exploitation (Line 10), playing strategy qτ recommend by FULL-INFORMATION(·) for Iτ .
- Given an online Bayesian persuasion problem with partial feedback, there exist W ◦ ⊆ W , W ⊆ W , and estimators usIτ (w) such that Algorithm 1 provides the following regret bound: nm2/3d log1/3 T 1/5
- In order to prove this result, the authors show that Algorithm 1 provides a regret bound that depends on the number |W | of signaling schemes used for exploration, the logarithm of |W ◦|, and the range and bias of the estimators usIτ (w).

Related work

- The closest line of research to ours is the one studying online learning problems in Stackelberg games. In these games, a leader commits to a probability distribution over a set of actions, and a follower plays an action maximizing her/his utility given the leader’s commitment [33]. In this setting, Letchford et al [25] and Blum et al [10] study the problem of computing the best leader’s strategy against an unknown follower using a polynomial number of best-response queries. Marecki et al [27] study the problem with a single follower with type drawn from a Bayesian prior.

Balcan et al [8] study how to minimize the leader’s regret in an online setting in which the follower’s type is unknown and chosen adversarially from a finite set. Although the problem is conceptually similar to ours, the Bayesian persuasion framework presents a number of additional challenges: the solution to a Stackelberg game consists of a point in a finite-dimensional simplex, while the solution to a Bayesian persuasion problem is a probability distribution with potentially infinite support size. This probability distribution is subject to additional consistency constraints, which (under partial feedback) rule out the possibility of exploiting unbiased estimators of the sender’s expected utility.

Funding

- Acknowledgments and Disclosure of Funding This work has been partially supported by the Italian MIUR PRIN 2017 Project ALGADIMAR “Algorithms, Games, and Digital Market”

Reference

- Ricardo Alonso and Odilon Câmara. Persuading voters. American Economic Review, 106(11):3590–3605, 2016.
- Gerry Antioch et al. Persuasion is now 30 per cent of US GDP: Revisiting McCloskey and Klamer after a quarter of a century. Economic Round-up, (1):1, 2013.
- Benjamin Assarf, Ewgenij Gawrilow, Katrin Herr, Michael Joswig, Benjamin Lorenz, Andreas Paffenholz, and Thomas Rehn. Computing convex hulls and counting integer points with polymake. Mathematical Programming Computation, 9(1):1–38, 2017.
- Baruch Awerbuch and Robert Kleinberg. Online linear optimization and adaptive routing. Journal of Computer and System Sciences, 74(1):97–114, 2008.
- Baruch Awerbuch and Yishay Mansour. Adapting to a reliable network path. In Proceedings of the twenty-second annual symposium on Principles of distributed computing, pages 360–367, 2003.
- Yakov Babichenko and Siddharth Barman. Algorithmic aspects of private Bayesian persuasion. In Innovations in Theoretical Computer Science Conference, 2017.
- Ashwinkumar Badanidiyuru, Kshipra Bhawalkar, and Haifeng Xu. Targeting and signaling in ad auctions. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2545–2563, 2018.
- Maria-Florina Balcan, Avrim Blum, Nika Haghtalab, and Ariel D. Procaccia. Commitment without regrets: Online learning in stackelberg security games. In Proceedings of the Sixteenth ACM Conference on Economics and Computation, page 61–78, 2015.
- Umang Bhaskar, Yu Cheng, Young Kun Ko, and Chaitanya Swamy. Hardness results for signaling in bayesian zero-sum and network routing games. In Proceedings of the 2016 ACM Conference on Economics and Computation, pages 479–496, 2016.
- Avrim Blum, Nika Haghtalab, and Ariel D Procaccia. Learning optimal commitment to overcome insecurity. In Advances in Neural Information Processing Systems, pages 1826–1834. 2014.
- Peter Bro Miltersen and Or Sheffet. Send mixed signals: earn more, work less. In Proceedings of the 13th ACM Conference on Electronic Commerce, pages 234–247, 2012.
- Sébastien Bubeck, Nicolo Cesa-Bianchi, et al. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends R in Machine Learning, 5(1):1–122, 2012.
- Ozan Candogan. Persuasion in networks: Public signals and k-cores. In Proceedings of the 2019 ACM Conference on Economics and Computation, pages 133–134, 2019.
- Matteo Castiglioni, Andrea Celli, and Nicola Gatti. Persuading voters: It’s easy to whisper, it’s hard to speak loud. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, pages 1870–1877, 2020.
- Nicolo Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge university press, 2006.
- Yu Cheng, Ho Yee Cheung, Shaddin Dughmi, Ehsan Emamjomeh-Zadeh, Li Han, and ShangHua Teng. Mixture selection, mechanism design, and signaling. In 56th Annual Symposium on Foundations of Computer Science, pages 1426–1445, 2015.
- Vincent Conitzer and Dmytro Korzhyk. Commitment to correlated strategies. In Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, page 632–637, 2011.
- Vincent Conitzer and Tuomas Sandholm. Computing the optimal strategy to commit to. In Proceedings of the 7th ACM Conference on Electronic Commerce, page 82–90, 2006.
- Yuval Emek, Michal Feldman, Iftah Gamzu, Renato PaesLeme, and Moshe Tennenholtz. Signaling schemes for revenue maximization. ACM Transactions on Economics and Computation, 2(2):1–19, 2014.
- Ewgenij Gawrilow and Michael Joswig. polymake: a framework for analyzing convex polytopes. In Polytopes—combinatorics and computation (Oberwolfach, 1997), volume 29 of DMV Sem., pages 43–73. Birkhäuser, Basel, 2000.
- Venkatesan Guruswami and Prasad Raghavendra. Hardness of learning halfspaces with noise. SIAM Journal on Computing, 39(2):742–765, 2009.
- Adam Kalai and Santosh Vempala. Efficient algorithms for online decision problems. Journal of Computer and System Sciences, 71(3):291–307, 2005.
- Emir Kamenica and Matthew Gentzkow. Bayesian persuasion. American Economic Review, 101(6):2590–2615, 2011.
- Emir Kamenica. Bayesian persuasion and information design. Annual Review of Economics, 11:249–272, 2019.
- Joshua Letchford, Vincent Conitzer, and Kamesh Munagala. Learning and approximating the optimal strategy to commit to. In International Symposium on Algorithmic Game Theory, pages 250–262, 2009.
- Yishay Mansour, Aleksandrs Slivkins, Vasilis Syrgkanis, and Zhiwei Steven Wu. Bayesian exploration: Incentivizing exploration in bayesian games. In Proceedings of the 2016 ACM Conference on Economics and Computation, pages 661–661, 2016.
- Janusz Marecki, Gerry Tesauro, and Richard Segal. Playing repeated stackelberg games with unknown opponents. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems, page 821–828, 2012.
- Donald McCloskey and Arjo Klamer. One quarter of GDP is persuasion. The American Economic Review, 85(2):191–195, 1995.
- Praveen Paruchuri, Jonathan P. Pearce, Janusz Marecki, Milind Tambe, Fernando Ordonez, and Sarit Kraus. Playing games for security: An efficient exact algorithm for solving bayesian stackelberg games. In Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems, page 895–902, 2008.
- Zinovi Rabinovich, Albert Xin Jiang, Manish Jain, and Haifeng Xu. Information disclosure as a means to security. In Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, pages 645–653, 2015.
- Tim Roughgarden and Joshua R. Wang. Minimizing regret with multiple reserves. In Proceedings of the 2016 ACM Conference on Economics and Computation, page 601–616, 2016.
- Shoshana Vasserman, Michal Feldman, and Avinatan Hassidim. Implementing the wisdom of waze. In Twenty-Fourth International Joint Conference on Artificial Intelligence, pages 660–666, 2015.
- Bernhard Von Stengel and Shmuel Zamir. Leadership games with convex strategy sets. Games and Economic Behavior, 69(2):446–457, 2010.
- Haifeng Xu, Rupert Freeman, Vincent Conitzer, Shaddin Dughmi, and Milind Tambe. Signaling in bayesian stackelberg games. In Proceedings of the 2016 International Conference on Autonomous Agents and Multiagent Systems, pages 150–158, 2016.
- Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the Twentieth International Conference on Machine Learning, pages 928–936, 2003.

Tags

Comments