Online Algorithm for Unsupervised Sequential Selection with Contextual Information

NIPS 2020, (2020)

Cited by: 0|Views58
EI
Full Text
Bibtex
Weibo

Abstract

In this paper, we study Contextual Unsupervised Sequential Selection (USS), a new variant of the stochastic contextual bandits problem where the loss of an arm cannot be inferred from the observed feedback. In our setup, arms are associated with fixed costs and are ordered, forming a cascade. In each round, a context is presented, and t...More

Code:

Data:

0
Introduction
  • Industrial systems, such as those found in medical, airport security, and manufacturing, utilize a suite of tests or classifiers for monitoring patients, people, and products.
  • Tests have costs with the more intrusive and informative ones resulting in higher monetary costs and higher latency.
  • For this reason, they are often organized as a classifier cascade (Chen et al, 2012; Trapeznikov and Saligrama, 2013; Wang et al, 2015), so that new input is first probed by an inexpensive test a more expensive one.
  • Recent works (Hanawal et al, 2017; Verma et al, 2019a, 2020a) propose methods for solving the USS problem; they
Highlights
  • Industrial systems, such as those found in medical, airport security, and manufacturing, utilize a suite of tests or classifiers for monitoring patients, people, and products
  • We propose notions of contextual weak dominance as a means to relate observed disagreements to differences in losses between any two arms
  • We studied the unsupervised sequential selection problem with contextual information
  • It is a partial monitoring stochastic contextual bandit problem, where the loss of an arm can not be inferred from the observed feedback
  • We modeled the disagreement probability between each pair of the arms as linearly parameterized and developed an algorithm named unsupervised sequential selection (USS)-PD that achieves O(log T ) regret with high probability
  • By using the side observations, one can tighten the regret bounds. Another interesting future direction is to develop algorithms that decide whether it needs to go further down in the cascade when more information about context is revealed along the cascade
Results
  • The authors compare the performance of USS-PD on four problem instances derived from the synthetic dataset.
  • The regret with supervision has lower than the USS-PD regret in Fig. 1b
  • It is qualitatively interesting because these plots demonstrate that, in typical cases, the unsupervised algorithm can eventually learn to perform as good as an algorithm with knowledge of true labels.
  • The authors show that the stronger the CWD property for the problem instance, it is easier to identify the optimal arm and, has lower regret, as shown in Fig. 2a.
  • The authors compare the performance of USS-PD with three baseline policies on problem instances derived from the Heart Disease dataset.
Conclusion
  • The authors studied the unsupervised sequential selection problem with contextual information
  • It is a partial monitoring stochastic contextual bandit problem, where the loss of an arm can not be inferred from the observed feedback.
  • By using the side observations, one can tighten the regret bounds
  • Another interesting future direction is to develop algorithms that decide whether it needs to go further down in the cascade when more information about context is revealed along the cascade
Tables
  • Table1: Details of different problem instances (PIs) derived from synthetic datasets
  • Table2: Details of different problem instances (PIs) derived from real datasets
Download tables as Excel
Related work
  • Stochastic Contextual multi-armed Bandits (SCB): In each round, the learner observes the context and decides which arm, among a finite number of arms, to apply (Beygelzimer et al, 2011). By playing an arm, the learner observes a stochastic reward that depends on the context and the arm selected. The most commonly studied model assumes that each arm is parameterized, and the mean reward of an arm is the inner product of the context and an unknown parameter associated with the arm. Contextual bandits have been applied to problems ranging from online advertising (Li et al, 2010; Chu et al, 2011) and recommendations (Langford and Zhang, 2008) to clinical trials (Woodroofe, 1979) and mobile health (Tewari and Murphy, 2017). Generalized linear models (GLM) assume that the mean reward is a non-linear link function of the inner product between the context vector and the unknown parameter vector (Filippi et al, 2010; Li et al, 2017). GLMs are also useful models for the classification problems where rewards, in the context of online learning problems, could be binary (Zhang et al, 2016; Jun et al, 2017). A more challenging non-parameterized version of the stochastic contextual bandits is studied in (Agarwal et al, 2014).
Funding
  • Hanawal would like to thank the support from INSPIRE faculty fellowships from DST, Government of India, SEED grant (16IRCCSG010) from IIT Bombay, and Early Career Research (ECR) Award from SERB
  • Csaba Szepesvári gratefully acknowledges funding from the Canada CIFAR AI Chairs Program, Amii, and NSERC
  • Venkatesh Saligrama would like to acknowledge NSF Grants DMS -2007350 (VS), CCF-2022446, CCF-1955981, and the Data Science Faculty Fellowship from the Rafik B
Study subjects and analysis
data samples: 5000
The details of the used problem instances are as follows. Synthetic Dataset: We consider 3-dimensional synthetic dataset with 5000 data samples. Each sample is represented by x = (x1, x2, x3), where the value of xj is drawn uniformly at random from (−1, 1)

Reference
  • M. Chen, Z. Xu, K. ̃Q. Weinberger, O. Chapelle, and D. Kedem. Classifier cascade: Tradeoff between accuracy and feature evaluation cost. In International Conference on Artificial Intelligence and Statistics, pages 235–242, 2012.
    Google ScholarLocate open access versionFindings
  • Kirill Trapeznikov and Venkatesh Saligrama. Supervised sequential classification under budget constraints. In Artificial Intelligence and Statistics, pages 581–589, 2013.
    Google ScholarLocate open access versionFindings
  • Joseph Wang, Kirill Trapeznikov, and Venkatesh Saligrama. Efficient learning by directed acyclic graph for resource constrained prediction. In Advances in Neural Information Processing Systems 28, pages 2152–2160. 2015.
    Google ScholarLocate open access versionFindings
  • Manjesh Hanawal, Csaba Szepesvari, and Venkatesh Saligrama. Unsupervised sequential sensor acquisition. In Artificial Intelligence and Statistics, pages 803–811, 2017.
    Google ScholarLocate open access versionFindings
  • Arun Verma, Manjesh K Hanawal, Csaba Szepesvari, and Venkatesh Saligrama. Online algorithm for unsupervised sensor selection. In Artificial Intelligence and Statistics, pages 3168–3176, 2019a.
    Google ScholarLocate open access versionFindings
  • Arun Verma, Manjesh K Hanawal, and Nandyala Hemachandra. Thompson sampling for unsupervised sequential selection. In Asian Conference on Machine Learning, pages 545–560. PMLR, 2020a.
    Google ScholarLocate open access versionFindings
  • Alina Beygelzimer, John Langford, Lihong Li, Lev Reyzin, and Robert E. Schapire. Contextual bandit algorithms with supervised learning guarantees. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2011.
    Google ScholarLocate open access versionFindings
  • Tor Lattimore and Csaba Szepesvári. Bandit algorithms. Cambridge University Press, 2020.
    Google ScholarFindings
  • Thomas Bonald and Richard Combes. A minimax optimal algorithm for crowdsourcing. In Advances in Neural Information Processing Systems, pages 4352–4360, 2017.
    Google ScholarLocate open access versionFindings
  • Matthäus Kleindessner and Pranjal Awasthi. Crowdsourcing with arbitrary adversaries. In International Conference on Machine Learning, pages 2713–2722, 2018.
    Google ScholarLocate open access versionFindings
  • Arun Verma, Manjesh Hanawal, Arun Rajkumar, and Raman Sankaran. Censored semi-bandits: A framework for resource allocation with censored feedback. In Advances in Neural Information Processing Systems, pages 14499–14509, 2019b.
    Google ScholarLocate open access versionFindings
  • Arun Verma, Manjesh K Hanawal, and Nandyala Hemachandra. Unsupervised online feature selection for cost-sensitive medical diagnosis. In 2020 International Conference on COMmunication Systems & NETworkS (COMSNETS), pages 1–6. IEEE, 2020b.
    Google ScholarLocate open access versionFindings
  • Sarah Filippi, Olivier Cappe, Aurélien Garivier, and Csaba Szepesvári. Parametric bandits: The generalized linear case. In Advances in Neural Information Processing Systems, pages 586–594, 2010.
    Google ScholarLocate open access versionFindings
  • Wei Chu, Lihong Li, Lev Reyzin, and Robert E Schapire. Contextual bandits with linear payoff functions. In International Conference on Artificial Intelligence and Statistics, pages 208–214, 2011.
    Google ScholarLocate open access versionFindings
  • Lihong Li, Yu Lu, and Dengyong Zhou. Provably optimal algorithms for generalized linear contextual bandits. In International Conference on Machine Learning, pages 2071–2080, 2017.
    Google ScholarLocate open access versionFindings
  • Lihong Li, Wei Chu, John Langford, and Robert E Schapire. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web, pages 661–670. ACM, 2010.
    Google ScholarLocate open access versionFindings
  • John Langford and Tong Zhang. The epoch-greedy algorithm for multi-armed bandits with side information. In Advances in neural information processing systems, pages 817–824, 2008.
    Google ScholarLocate open access versionFindings
  • Michael Woodroofe. A one-armed bandit problem with a concomitant variable. Journal of the American Statistical Association, 74(368):799–806, 1979.
    Google ScholarLocate open access versionFindings
  • Lijun Zhang, Tianbao Yang, Rong Jin, Yichi Xiao, and Zhi-hua Zhou. Online stochastic linear optimization under one-bit feedback. In International Conference on Machine Learning, pages 392–401, 2016.
    Google ScholarLocate open access versionFindings
  • Kwang-Sung Jun, Aniruddha Bhargava, Robert Nowak, and Rebecca Willett. Scalable generalized linear bandits: Online computation and hashing. In Advances in Neural Information Processing Systems, pages 99–109, 2017.
    Google ScholarLocate open access versionFindings
  • Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, and Robert E. Schapire. Taming the monster: A fast and simple algorithm for contextual bandits. In International Conference on Machine Learning, 2014.
    Google ScholarLocate open access versionFindings
  • Peter Auer. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3(Nov):397–422, 2002.
    Google ScholarLocate open access versionFindings
  • Varsha Dani, Thomas P Hayes, and Sham M Kakade. Stochastic linear optimization under bandit feedback. In COLT, pages 355–366, 2008.
    Google ScholarLocate open access versionFindings
  • Paat Rusmevichientong and John N Tsitsiklis. Linearly parameterized bandits. Mathematics of Operations Research, 35(2):395–411, 2010.
    Google ScholarLocate open access versionFindings
  • Yasin Abbasi-Yadkori, Dávid Pál, and Csaba Szepesvári. Improved algorithms for linear stochastic bandits. In Advances in Neural Information Processing Systems, pages 2312–2320, 2011.
    Google ScholarLocate open access versionFindings
  • Nicolo Cesa-Bianchi, Gábor Lugosi, and Gilles Stoltz. Regret minimization under partial monitoring. Mathematics of Operations Research, 31(3):562–580, 2006.
    Google ScholarLocate open access versionFindings
  • Gábor Bartók, Dean P Foster, Dávid Pál, Alexander Rakhlin, and Csaba Szepesvári. Partial monitoring—classification, regret bounds, and algorithms. Mathematics of Operations Research, 39(4):967–997, 2014.
    Google ScholarLocate open access versionFindings
  • Roman Vershynin. Introduction to the non-asymptotic analysis of random matrices, page 210–268. Cambridge University Press, 2012. doi: 10.1017/CBO9780511794308.006.
    Findings
  • UCI Machine Learning, Kaggle. Pima Indians Diabetes Database, 2016. URL https://www.kaggle.com/uciml/pima-indians-diabetes-database.
    Locate open access versionFindings
  • Robert Detrano. V.A. Medical Center, Long Beach and Cleveland Clinic Foundation: Robert Detrano, MD, Ph.D., Donor: David W. Aha, 1998. URL https://archive.ics.uci.edu/ml/datasets/Heart+Disease.
    Findings
  • Dua Dheeru and Efi Karra Taniskidou. UCI machine learning repository, 2017. URL http://archive.ics.uci.edu/ml.
    Findings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科