## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# Finding All ϵϵ\epsilon-Good Arms in Stochastic Bandits

NIPS 2020, (2020)

Full Text

Weibo

Keywords

Abstract

The pure-exploration problem in stochastic multi-armed bandits aims to find one or more arms with the largest (or near largest) means. Examples include finding an ✏-good arm, best-arm identification, top-k arm identification, and finding all arms with means above a specified threshold. However, the problem of finding all ✏-good arms has b...More

Code:

Data:

Introduction

- The authors propose a new multi-armed bandit problem where the objective is to return all arms that are ✏-good relative to the best-arm.
- If |G✏| = k, [6], 2) log(1/ )) samples in expectation to find such an arm and [10] provide an algorithm that matches this to doubly logarithmic factors, though methods such as [4, 9, 18, 19] achieve better empirical performance.
- In the fixed confidence setting, threshold bandits is closely related to that of multiple hypothesis testing, and recent work [21] achieves tight upper and lower bounds for this problem including tighter logarithmic factors similar to those for TOP-k.

Highlights

- We propose a new multi-armed bandit problem where the objective is to return all arms that are ✏-good relative to the best-arm
- The ALL-✏ problem is a novel setting in the bandits literature, adjacent to two other methods for finding many good arms: TOP-k where the goal is to return the arms with the k highest means, and threshold bandits where the goal is to identify all arms above a fixed threshold
- We argue that the ALL-✏ problem formulation is more appropriate in many applications, and we show that it presents some unique challenges that make its solution distinct from TOP-k and threshold bandits
- We extend the Simulator technique via a novel reduction to composite hypothesis testing in order to connect to ALL-✏
- Μi) 2) samples are necessary for instances where no arm is within 2 ✏ of μ1 compared to the lower bound Theorem 2.1
- We point out that Theorem 4.1 highlights that (ST)2 is optimal on these instances up to a log factor! The algorithm we present FAREAST, improves (ST)2’s dependence on and matches the lower bound in Theorem 4.1 for certain instances

Results

- The authors pull the arm with the highest upper confidence bound, to the UCB algorithm, [3], to refine an estimate of the threshold using the highest empirical mean (Sample the Threshold).
- The authors first state an improved sample complexity lower bound for a family of problem instances that makes explicit the moderate confidence terms.
- Μi) 2) samples are necessary for instances where no arm is within 2 ✏ of μ1 compared to the lower bound Theorem 2.1.
- Elimination Algorithm for a Sampled Threshold) proceeds in rounds r and maintains sets Gbr and Bbr of arms far declared to be good or bad.
- Round, μi) 2, Gbr = G✏, ie all (μ1+↵✏ μi) 2 good arms have been found, having log(n/ ) samples, matching the used fewer than lower bound.
- For 16 FAREAST terminates on E and returns a set Gb such that G✏ ⇢ Gb ⇢ G✏+ in a number of samples no more than a constant times (3), the complexity of (ST)2.
- 4 Let A = [n] be the active set, Ni = 0 track the total number of samples of arm i by the Good Filter.
- Both (ST)2 and FAREAST are optimal in this setting; the authors show the scaling of their sample complexity as the number of arms increases while keeping the threshold, ↵✏, and ✏ constant.

Conclusion

- As discussed in the introduction, in many applications such as the New Yorker Cartoon Caption Contest (NYCCC), the ALL-✏ objective returns a set of good arms which can be screened further to choose a favorite.
- K set to the number of ✏-good arms, and a threshold-bandit, APT [1] given the value of 0.9μ1.
- The authors believe that the objective studied in this work, that of returning all arms whose mean is quantifiably near-best, more naturally aligns with practical objectives as diverse as finding funny captions to performing medical tests.

Summary

- The authors propose a new multi-armed bandit problem where the objective is to return all arms that are ✏-good relative to the best-arm.
- If |G✏| = k, [6], 2) log(1/ )) samples in expectation to find such an arm and [10] provide an algorithm that matches this to doubly logarithmic factors, though methods such as [4, 9, 18, 19] achieve better empirical performance.
- In the fixed confidence setting, threshold bandits is closely related to that of multiple hypothesis testing, and recent work [21] achieves tight upper and lower bounds for this problem including tighter logarithmic factors similar to those for TOP-k.
- The authors pull the arm with the highest upper confidence bound, to the UCB algorithm, [3], to refine an estimate of the threshold using the highest empirical mean (Sample the Threshold).
- The authors first state an improved sample complexity lower bound for a family of problem instances that makes explicit the moderate confidence terms.
- Μi) 2) samples are necessary for instances where no arm is within 2 ✏ of μ1 compared to the lower bound Theorem 2.1.
- Elimination Algorithm for a Sampled Threshold) proceeds in rounds r and maintains sets Gbr and Bbr of arms far declared to be good or bad.
- Round, μi) 2, Gbr = G✏, ie all (μ1+↵✏ μi) 2 good arms have been found, having log(n/ ) samples, matching the used fewer than lower bound.
- For 16 FAREAST terminates on E and returns a set Gb such that G✏ ⇢ Gb ⇢ G✏+ in a number of samples no more than a constant times (3), the complexity of (ST)2.
- 4 Let A = [n] be the active set, Ni = 0 track the total number of samples of arm i by the Good Filter.
- Both (ST)2 and FAREAST are optimal in this setting; the authors show the scaling of their sample complexity as the number of arms increases while keeping the threshold, ↵✏, and ✏ constant.
- As discussed in the introduction, in many applications such as the New Yorker Cartoon Caption Contest (NYCCC), the ALL-✏ objective returns a set of good arms which can be screened further to choose a favorite.
- K set to the number of ✏-good arms, and a threshold-bandit, APT [1] given the value of 0.9μ1.
- The authors believe that the objective studied in this work, that of returning all arms whose mean is quantifiably near-best, more naturally aligns with practical objectives as diverse as finding funny captions to performing medical tests.

Funding

- Funding Transparency Statement The work presented in this paper was supported by ARO grant W911NF-15-1-0479. Additionally, this work was partially supported by the MADLab AF Center of Excellence FA9550-18-1-0166.

Reference

- Andrea Locatelli, Maurilio Gutzeit, and Alexandra Carpentier. An optimal algorithm for the thresholding bandit problem. In Proceedings of the 33rd International Conference on International Conference on Machine Learning-Volume 48, pages 1690–1698. JMLR. org, 2016.
- Serge Christmann-Franck, Gerard JP van Westen, George Papadatos, Fanny Beltran Escudie, Alexander Roberts, John P Overington, and Daniel Domine. Unprecedently large-scale kinase inhibitor set enabling the accurate prediction of compound–kinase activities: A way toward selective promiscuity by design? Journal of chemical information and modeling, 56(9):1654– 1675, 2016.
- Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem. Machine learning, 47(2-3):235–256, 2002.
- Shivaram Kalyanakrishnan, Ambuj Tewari, Peter Auer, and Peter Stone. Pac subset selection in stochastic multi-armed bandits. In ICML, volume 12, pages 655–662, 2012.
- Sébastian Bubeck, Tengyao Wang, and Nitin Viswanathan. Multiple identifications in multiarmed bandits. In International Conference on Machine Learning, pages 258–265, 2013.
- Emilie Kaufmann, Olivier Cappé, and Aurélien Garivier. On the complexity of best-arm identification in multi-armed bandit models. The Journal of Machine Learning Research, 17(1):1–42, 2016.
- Victor Gabillon, Mohammad Ghavamzadeh, and Alessandro Lazaric. Best arm identification: A unified approach to fixed budget and fixed confidence. In Advances in Neural Information Processing Systems, pages 3212–3220, 2012.
- Wenbo Ren, Jia Liu, and Ness B Shroff. Exploring k out of top ⇢ fraction of arms in stochastic bandits. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 2820–2828, 2019.
- Max Simchowitz, Kevin Jamieson, and Benjamin Recht. The simulator: Understanding adaptive sampling in the moderate-confidence regime. In Conference on Learning Theory, pages 1794– 1834, 2017.
- Zohar Karnin, Tomer Koren, and Oren Somekh. Almost optimal exploration in multi-armed bandits. In International Conference on Machine Learning, pages 1238–1246, 2013.
- Shie Mannor and John N Tsitsiklis. The sample complexity of exploration in the multi-armed bandit problem. Journal of Machine Learning Research, 5(Jun):623–648, 2004.
- Eyal Even-Dar, Shie Mannor, and Yishay Mansour. Pac bounds for multi-armed bandit and markov decision processes. In International Conference on Computational Learning Theory, pages 255–270.
- Eyal Even-Dar, Shie Mannor, and Yishay Mansour. Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. Journal of machine learning research, 7(Jun):1079–1105, 2006.
- Shivaram Kalyanakrishnan and Peter Stone. Efficient selection of multiple bandit arms: Theory and practice. In ICML, volume 10, pages 511–518, 2010.
- Julian Katz-Samuels and Kevin Jamieson. The true sample complexity of identifying good arms. arXiv preprint arXiv:1906.06594, 2019.
- Rémy Degenne and Wouter M Koolen. Pure exploration with multiple correct answers. In Advances in Neural Information Processing Systems, pages 14564–14573, 2019.
- Emilie Kaufmann and Shivaram Kalyanakrishnan. Information complexity in bandit subset selection. In Conference on Learning Theory, pages 228–251, 2013.
- Arghya Roy Chaudhuri and Shivaram Kalyanakrishnan. Pac identification of a bandit arm relative to a reward quantile. In Thirty-First AAAI Conference on Artificial Intelligence, 2017.
- Arghya Roy Chaudhuri and Shivaram Kalyanakrishnan. Pac identification of many good arms in stochastic multi-armed bandits. In International Conference on Machine Learning, pages 991–1000, 2019.
- Hideaki Kano, Junya Honda, Kentaro Sakamaki, Kentaro Matsuura, Atsuyoshi Nakamura, and Masashi Sugiyama. Good arm identification via bandit feedback. Machine Learning, 108(5):721–745, 2019.
- Kevin Jamieson and Lalit Jain. A bandit approach to multiple testing with false discovery control. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, page 3664–3674, Red Hook, NY, USA, 2018. Curran Associates Inc.
- Matthew L Malloy and Robert D Nowak. Sequential testing for sparse recovery. IEEE Transactions on Information Theory, 60(12):7862–7873, 2014.
- Kevin Jamieson, Matthew Malloy, Robert Nowak, and Sébastien Bubeck. lil’ucb: An optimal exploration algorithm for multi-armed bandits. In Conference on Learning Theory, pages 423–439, 2014.
- Steven R Howard, Aaditya Ramdas, Jon McAuliffe, and Jasjeet Sekhon. Uniform, nonparametric, non-asymptotic confidence sequences. arXiv preprint arXiv:1810.08240, 2018.
- Lijie Chen, Jian Li, and Mingda Qiao. Nearly instance optimal sample complexity bounds for top-k arm selection. In Artificial Intelligence and Statistics, pages 101–110, 2017.
- Ervin Tanczos, Robert Nowak, and Bob Mankoff. A kl-lucb algorithm for large-scale crowdsourcing. In Advances in Neural Information Processing Systems, pages 5894–5903, 2017.
- David H Drewry, Carrow I Wells, David M Andrews, Richard Angell, Hassan Al-Ali, Alison D Axtman, Stephen J Capuzzi, Jonathan M Elkins, Peter Ettmayer, Mathias Frederiksen, et al. Progress towards a public chemogenomic set for protein kinases and a call for contributions. PloS one, 12(8), 2017.
- Matteo Bocci, Jonas Sjölund, Ewa Kurzejamska, David Lindgren, Michael Bartoschek, Mattias Höglund, Kristian Pietras, et al. Activin receptor-like kinase 1 is associated with immune cell infiltration and regulates clec14a transcription in cancer. Angiogenesis, 22(1):117–131, 2019.
- Patricia Dranchak, Ryan MacArthur, Rajarshi Guha, William J Zuercher, David H Drewry, Douglas S Auld, and James Inglese. Profile of the gsk published protein kinase inhibitor set across atp-dependent and-independent luciferases: implications for reporter-gene assays. PloS one, 8(3), 2013.

Tags

Comments