Online Algorithm for Unsupervised Sequential Selection with Contextual Information
NIPS 2020, (2020)
In this paper, we study Contextual Unsupervised Sequential Selection (USS), a new variant of the stochastic contextual bandits problem where the loss of an arm cannot be inferred from the observed feedback. In our setup, arms are associated with fixed costs and are ordered, forming a cascade. In each round, a context is presented, and t...More
PPT (Upload PPT)
- Industrial systems, such as those found in medical, airport security, and manufacturing, utilize a suite of tests or classifiers for monitoring patients, people, and products.
- Tests have costs with the more intrusive and informative ones resulting in higher monetary costs and higher latency.
- For this reason, they are often organized as a classifier cascade (Chen et al, 2012; Trapeznikov and Saligrama, 2013; Wang et al, 2015), so that new input is first probed by an inexpensive test a more expensive one.
- Recent works (Hanawal et al, 2017; Verma et al, 2019a, 2020a) propose methods for solving the USS problem; they
- Industrial systems, such as those found in medical, airport security, and manufacturing, utilize a suite of tests or classifiers for monitoring patients, people, and products
- We propose notions of contextual weak dominance as a means to relate observed disagreements to differences in losses between any two arms
- We studied the unsupervised sequential selection problem with contextual information
- It is a partial monitoring stochastic contextual bandit problem, where the loss of an arm can not be inferred from the observed feedback
- We modeled the disagreement probability between each pair of the arms as linearly parameterized and developed an algorithm named unsupervised sequential selection (USS)-PD that achieves O(log T ) regret with high probability
- By using the side observations, one can tighten the regret bounds. Another interesting future direction is to develop algorithms that decide whether it needs to go further down in the cascade when more information about context is revealed along the cascade
- The authors compare the performance of USS-PD on four problem instances derived from the synthetic dataset.
- The regret with supervision has lower than the USS-PD regret in Fig. 1b
- It is qualitatively interesting because these plots demonstrate that, in typical cases, the unsupervised algorithm can eventually learn to perform as good as an algorithm with knowledge of true labels.
- The authors show that the stronger the CWD property for the problem instance, it is easier to identify the optimal arm and, has lower regret, as shown in Fig. 2a.
- The authors compare the performance of USS-PD with three baseline policies on problem instances derived from the Heart Disease dataset.
- The authors studied the unsupervised sequential selection problem with contextual information
- It is a partial monitoring stochastic contextual bandit problem, where the loss of an arm can not be inferred from the observed feedback.
- By using the side observations, one can tighten the regret bounds
- Another interesting future direction is to develop algorithms that decide whether it needs to go further down in the cascade when more information about context is revealed along the cascade
- Table1: Details of different problem instances (PIs) derived from synthetic datasets
- Table2: Details of different problem instances (PIs) derived from real datasets
- Stochastic Contextual multi-armed Bandits (SCB): In each round, the learner observes the context and decides which arm, among a finite number of arms, to apply (Beygelzimer et al, 2011). By playing an arm, the learner observes a stochastic reward that depends on the context and the arm selected. The most commonly studied model assumes that each arm is parameterized, and the mean reward of an arm is the inner product of the context and an unknown parameter associated with the arm. Contextual bandits have been applied to problems ranging from online advertising (Li et al, 2010; Chu et al, 2011) and recommendations (Langford and Zhang, 2008) to clinical trials (Woodroofe, 1979) and mobile health (Tewari and Murphy, 2017). Generalized linear models (GLM) assume that the mean reward is a non-linear link function of the inner product between the context vector and the unknown parameter vector (Filippi et al, 2010; Li et al, 2017). GLMs are also useful models for the classification problems where rewards, in the context of online learning problems, could be binary (Zhang et al, 2016; Jun et al, 2017). A more challenging non-parameterized version of the stochastic contextual bandits is studied in (Agarwal et al, 2014).
- Hanawal would like to thank the support from INSPIRE faculty fellowships from DST, Government of India, SEED grant (16IRCCSG010) from IIT Bombay, and Early Career Research (ECR) Award from SERB
- Csaba Szepesvári gratefully acknowledges funding from the Canada CIFAR AI Chairs Program, Amii, and NSERC
- Venkatesh Saligrama would like to acknowledge NSF Grants DMS -2007350 (VS), CCF-2022446, CCF-1955981, and the Data Science Faculty Fellowship from the Rafik B
Study subjects and analysis
data samples: 5000
The details of the used problem instances are as follows. Synthetic Dataset: We consider 3-dimensional synthetic dataset with 5000 data samples. Each sample is represented by x = (x1, x2, x3), where the value of xj is drawn uniformly at random from (−1, 1)
- M. Chen, Z. Xu, K. ̃Q. Weinberger, O. Chapelle, and D. Kedem. Classifier cascade: Tradeoff between accuracy and feature evaluation cost. In International Conference on Artificial Intelligence and Statistics, pages 235–242, 2012.
- Kirill Trapeznikov and Venkatesh Saligrama. Supervised sequential classification under budget constraints. In Artificial Intelligence and Statistics, pages 581–589, 2013.
- Joseph Wang, Kirill Trapeznikov, and Venkatesh Saligrama. Efficient learning by directed acyclic graph for resource constrained prediction. In Advances in Neural Information Processing Systems 28, pages 2152–2160. 2015.
- Manjesh Hanawal, Csaba Szepesvari, and Venkatesh Saligrama. Unsupervised sequential sensor acquisition. In Artificial Intelligence and Statistics, pages 803–811, 2017.
- Arun Verma, Manjesh K Hanawal, Csaba Szepesvari, and Venkatesh Saligrama. Online algorithm for unsupervised sensor selection. In Artificial Intelligence and Statistics, pages 3168–3176, 2019a.
- Arun Verma, Manjesh K Hanawal, and Nandyala Hemachandra. Thompson sampling for unsupervised sequential selection. In Asian Conference on Machine Learning, pages 545–560. PMLR, 2020a.
- Alina Beygelzimer, John Langford, Lihong Li, Lev Reyzin, and Robert E. Schapire. Contextual bandit algorithms with supervised learning guarantees. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2011.
- Tor Lattimore and Csaba Szepesvári. Bandit algorithms. Cambridge University Press, 2020.
- Thomas Bonald and Richard Combes. A minimax optimal algorithm for crowdsourcing. In Advances in Neural Information Processing Systems, pages 4352–4360, 2017.
- Matthäus Kleindessner and Pranjal Awasthi. Crowdsourcing with arbitrary adversaries. In International Conference on Machine Learning, pages 2713–2722, 2018.
- Arun Verma, Manjesh Hanawal, Arun Rajkumar, and Raman Sankaran. Censored semi-bandits: A framework for resource allocation with censored feedback. In Advances in Neural Information Processing Systems, pages 14499–14509, 2019b.
- Arun Verma, Manjesh K Hanawal, and Nandyala Hemachandra. Unsupervised online feature selection for cost-sensitive medical diagnosis. In 2020 International Conference on COMmunication Systems & NETworkS (COMSNETS), pages 1–6. IEEE, 2020b.
- Sarah Filippi, Olivier Cappe, Aurélien Garivier, and Csaba Szepesvári. Parametric bandits: The generalized linear case. In Advances in Neural Information Processing Systems, pages 586–594, 2010.
- Wei Chu, Lihong Li, Lev Reyzin, and Robert E Schapire. Contextual bandits with linear payoff functions. In International Conference on Artificial Intelligence and Statistics, pages 208–214, 2011.
- Lihong Li, Yu Lu, and Dengyong Zhou. Provably optimal algorithms for generalized linear contextual bandits. In International Conference on Machine Learning, pages 2071–2080, 2017.
- Lihong Li, Wei Chu, John Langford, and Robert E Schapire. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web, pages 661–670. ACM, 2010.
- John Langford and Tong Zhang. The epoch-greedy algorithm for multi-armed bandits with side information. In Advances in neural information processing systems, pages 817–824, 2008.
- Michael Woodroofe. A one-armed bandit problem with a concomitant variable. Journal of the American Statistical Association, 74(368):799–806, 1979.
- Lijun Zhang, Tianbao Yang, Rong Jin, Yichi Xiao, and Zhi-hua Zhou. Online stochastic linear optimization under one-bit feedback. In International Conference on Machine Learning, pages 392–401, 2016.
- Kwang-Sung Jun, Aniruddha Bhargava, Robert Nowak, and Rebecca Willett. Scalable generalized linear bandits: Online computation and hashing. In Advances in Neural Information Processing Systems, pages 99–109, 2017.
- Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, and Robert E. Schapire. Taming the monster: A fast and simple algorithm for contextual bandits. In International Conference on Machine Learning, 2014.
- Peter Auer. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3(Nov):397–422, 2002.
- Varsha Dani, Thomas P Hayes, and Sham M Kakade. Stochastic linear optimization under bandit feedback. In COLT, pages 355–366, 2008.
- Paat Rusmevichientong and John N Tsitsiklis. Linearly parameterized bandits. Mathematics of Operations Research, 35(2):395–411, 2010.
- Yasin Abbasi-Yadkori, Dávid Pál, and Csaba Szepesvári. Improved algorithms for linear stochastic bandits. In Advances in Neural Information Processing Systems, pages 2312–2320, 2011.
- Nicolo Cesa-Bianchi, Gábor Lugosi, and Gilles Stoltz. Regret minimization under partial monitoring. Mathematics of Operations Research, 31(3):562–580, 2006.
- Gábor Bartók, Dean P Foster, Dávid Pál, Alexander Rakhlin, and Csaba Szepesvári. Partial monitoring—classification, regret bounds, and algorithms. Mathematics of Operations Research, 39(4):967–997, 2014.
- Roman Vershynin. Introduction to the non-asymptotic analysis of random matrices, page 210–268. Cambridge University Press, 2012. doi: 10.1017/CBO9780511794308.006.
- UCI Machine Learning, Kaggle. Pima Indians Diabetes Database, 2016. URL https://www.kaggle.com/uciml/pima-indians-diabetes-database.
- Robert Detrano. V.A. Medical Center, Long Beach and Cleveland Clinic Foundation: Robert Detrano, MD, Ph.D., Donor: David W. Aha, 1998. URL https://archive.ics.uci.edu/ml/datasets/Heart+Disease.
- Dua Dheeru and Efi Karra Taniskidou. UCI machine learning repository, 2017. URL http://archive.ics.uci.edu/ml.