Regret in Online Recommendation Systems

NIPS 2020, (2020)

被引用0|浏览42
下载 PDF 全文
引用
微博一下

摘要

This paper proposes a theoretical analysis of recommendation systems in an online setting, where items are sequentially recommended to users over time. In each round, a user, randomly picked from a population of m users, requests a recommendation. The decision-maker observes the user and selects an item from a catalogue of n items. Import...更多

代码

数据

简介
  • Recommendation systems [28] have over the last two decades triggered important research efforts, mainly focused towards the design and analysis of algorithms with improved efficiency.
  • There, the authors explicitly model the no-repetition constraint but consider user clusters only, and do not provide regret lower bounds.
  • When the algorithm recommends an item i for the first time, it is assigned to cluster Ik with probability αk as in Model A.
重点内容
  • Recommendation systems [28] have over the last two decades triggered important research efforts, mainly focused towards the design and analysis of algorithms with improved efficiency
  • Most recommendation systems operate in an online setting, where items are sequentially recommended to users over time
  • We study the regret of online recommendation algorithms, defined as the difference between their expected number of successful recommendations to that obtained under an Oracle algorithm aware of the structure and of the success rates of each pair
  • We investigate three types of systems depending on the structural assumptions made on the success rates ρ =i∈I,u∈U
  • We present Explore-Cluster with Upper Confidence Sets (EC-UCS), an algorithm that essentially exhibits the same regret scaling as our lower bound
  • This paper proposes and analyzes several models for online recommendation systems
结果
  • The authors are able to quantify the minimal regret induced by the specific features of the problem: (i) the no-repetition constraint, (ii) the unknown success probabilities, (iii) the unknown item clusters, (iv) the unknown user clusters.
  • Ric(T ), the regrets due to the no-repetition constraint and to the unknown item clusters, respectively, are defined by Rnr(T ) := n k=1 αk ∆k and k=1 αkφ(k, m, p)∆k.
  • From the above theorem, analyzing the way Rnr(T ), Ric(T ), and Rsp(T ) scale, the authors can deduce that: (i) When T = o(m log(m)), the regret arises mainly due to either the no-repetition constraint or the need to learn the success probabilities, and it scales at least as max{
  • (iii) When T = ω(m log(m)), the regret arises mainly due to either the no-repetition constraint or the need to learn the item clusters, and it scales at least as
  • The regret is induced by the no-repetition constraint, and by the fact the success rate of an item when it is first selected and the distribution ζ are unknown.
  • Its regret satisfies: for all T ≥ 2m such that m ≥ c/ mink, ∆2k, Rπ(T ) ≥ max{Rnr(T ), Ric(T ), Ruc(T )}, where Rnr(T ), Ric(T ), and Ruc(T ) are regrets due to the no-repetition constraint, to the unknown item clusters, and to the unknown user clusters respectively, defined by:
  • Explore-Cluster-and-Test (ECT), achieves a better regret scaling and complies with the no-repetition constraint.
  • ECT is designed to comply with the no-repetition constraint: for example, in the exploration phase, when the user arrives, if the authors cannot recommend an item from S due to the constraint, the authors randomly select an item not violating the constraint.
  • The regret lower bound of Theorem 1 states that for any algorithm π, Rπ(T ) = Ω(N ), and if π is uniformly good Rπ(T ) = Ω(max{N , log(T )}).
结论
  • The authors present Explore-Cluster with Upper Confidence Sets (EC-UCS), an algorithm that essentially exhibits the same regret scaling as the lower bound.
  • In Appendix A.3, the authors present ECB, a much simpler algorithm than EC-UCS, but whose regret upper bound, derived in Appendix J, always scales as m log(N ).
  • The authors may try to extend the analysis to the very popular linear reward structure, but accounting for no-repetition constraint
总结
  • Recommendation systems [28] have over the last two decades triggered important research efforts, mainly focused towards the design and analysis of algorithms with improved efficiency.
  • There, the authors explicitly model the no-repetition constraint but consider user clusters only, and do not provide regret lower bounds.
  • When the algorithm recommends an item i for the first time, it is assigned to cluster Ik with probability αk as in Model A.
  • The authors are able to quantify the minimal regret induced by the specific features of the problem: (i) the no-repetition constraint, (ii) the unknown success probabilities, (iii) the unknown item clusters, (iv) the unknown user clusters.
  • Ric(T ), the regrets due to the no-repetition constraint and to the unknown item clusters, respectively, are defined by Rnr(T ) := n k=1 αk ∆k and k=1 αkφ(k, m, p)∆k.
  • From the above theorem, analyzing the way Rnr(T ), Ric(T ), and Rsp(T ) scale, the authors can deduce that: (i) When T = o(m log(m)), the regret arises mainly due to either the no-repetition constraint or the need to learn the success probabilities, and it scales at least as max{
  • (iii) When T = ω(m log(m)), the regret arises mainly due to either the no-repetition constraint or the need to learn the item clusters, and it scales at least as
  • The regret is induced by the no-repetition constraint, and by the fact the success rate of an item when it is first selected and the distribution ζ are unknown.
  • Its regret satisfies: for all T ≥ 2m such that m ≥ c/ mink, ∆2k, Rπ(T ) ≥ max{Rnr(T ), Ric(T ), Ruc(T )}, where Rnr(T ), Ric(T ), and Ruc(T ) are regrets due to the no-repetition constraint, to the unknown item clusters, and to the unknown user clusters respectively, defined by:
  • Explore-Cluster-and-Test (ECT), achieves a better regret scaling and complies with the no-repetition constraint.
  • ECT is designed to comply with the no-repetition constraint: for example, in the exploration phase, when the user arrives, if the authors cannot recommend an item from S due to the constraint, the authors randomly select an item not violating the constraint.
  • The regret lower bound of Theorem 1 states that for any algorithm π, Rπ(T ) = Ω(N ), and if π is uniformly good Rπ(T ) = Ω(max{N , log(T )}).
  • The authors present Explore-Cluster with Upper Confidence Sets (EC-UCS), an algorithm that essentially exhibits the same regret scaling as the lower bound.
  • In Appendix A.3, the authors present ECB, a much simpler algorithm than EC-UCS, but whose regret upper bound, derived in Appendix J, always scales as m log(N ).
  • The authors may try to extend the analysis to the very popular linear reward structure, but accounting for no-repetition constraint
相关工作
  • The design of recommendation systems has been framed into structured bandit problems in the past. Most of the work there consider a linear reward structure (in the spirit of the matrix factorization approach), see e.g. [9], [10], [22], [20], [21], [11]. These papers ignore the no-repetition constraint (a usual assumption there is that when a user arrives, a set of fresh items can be recommended). In [24], the authors try to include this constraint but do not present any analytical result. Furthermore, notice that the structures we impose in our models are different than that considered in the low-rank matrix factorization approach.

    Our work also relates to the literature on clustered bandits. Again the no-repetition constraint is not modeled. In addition, most often, only the user clusters [6], [23] or only the item clusters are considered [18], [14]. Low-rank bandits extend clustered bandits by modeling the (item, user) success rates as a low-rank matrix, see [15], [25], still without accounting for the no-repetition constraint, and without a complete analysis (no precise regret lower bounds).
基金
  • Ariu was supported by the Nakajima Foundation Scholarship
  • Ryu were supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT)(No.2019-0-00075, Artificial Intelligence Graduate School Program(KAIST))
  • Proutiere’s research is supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation
研究对象与分析
log T users: 42
82 ε2 log T. items and recommend each selected item to 42 log T users. For each item i ∈ S, we compute its empirical success rate ρi

引用论文
  • Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem. Machine learning, pages 235–256, 2002.
    Google ScholarLocate open access versionFindings
  • Thomas Bonald and Alexandre Proutiere. Two-target algorithms for infinite-armed bandits with bernoulli rewards. In Advances in Neural Information Processing Systems 26, pages 2184–2192013.
    Google ScholarLocate open access versionFindings
  • Guy Bresler and Mina Karzand. Regret bounds and regimes of optimality for user-user and item-item collaborative filtering. In 2018 Information Theory and Applications Workshop (ITA), pages 1–37, 2018.
    Google ScholarLocate open access versionFindings
  • Guy Bresler, George H Chen, and Devavrat Shah. A latent source model for online collaborative filtering. In Advances in Neural Information Processing Systems, pages 3347–3355, 2014.
    Google ScholarLocate open access versionFindings
  • Sebastien Bubeck, Vianney Perchet, and Philippe Rigollet. Bounded regret in stochastic multi-armed bandits. In Proceedings of the 26th Annual Conference on Learning Theory, pages 122–134, 2013.
    Google ScholarLocate open access versionFindings
  • Loc Bui, Ramesh Johari, and Shie Mannor. Clustered bandits. arXiv preprint arXiv:1206.4169, 2012.
    Findings
  • Richard Combes, Chong Jiang, and Rayadurgam Srikant. Bandits with budgets: Regret lower bounds and optimal algorithms. In Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pages 245–257, 2015.
    Google ScholarLocate open access versionFindings
  • Aurélien Garivier, Pierre Ménard, and Gilles Stoltz. Explore first, exploit next: The true shape of regret in bandit problems. Mathematics of Operations Research, 2018.
    Google ScholarLocate open access versionFindings
  • Claudio Gentile, Shuai Li, and Giovanni Zappella. Online clustering of bandits. In Proceedings of the 31th International Conference on Machine Learning, pages 757–765, 2014.
    Google ScholarLocate open access versionFindings
  • Claudio Gentile, Shuai Li, Purushottam Kar, Alexandros Karatzoglou, Giovanni Zappella, and Evans Etrue. On context-dependent clustering of bandits. In Proceedings of the 34th International Conference on Machine Learning, pages 1253–1262, 2017.
    Google ScholarLocate open access versionFindings
  • Aditya Gopalan, Odalric-Ambrym Maillard, and Mohammadi Zaki. Low-rank bandits with latent mixtures. arXiv preprint arXiv:1609.01508, 2016.
    Findings
  • Botao Hao, Tor Lattimore, and Csaba Szepesvari. Adaptive exploration in linear contextual bandit. arXiv preprint arXiv:1910.06996, 2019.
    Findings
  • Reinhard Heckel and Kannan Ramchandran. The sample complexity of online one-class collaborative filtering. In Proceedings of the 34th International Conference on Machine Learning, pages 1452–1460, 2017.
    Google ScholarLocate open access versionFindings
  • Matthieu Jedor, Vianney Perchet, and Jonathan Louedec. Categorized bandits. In Advances in Neural Information Processing Systems, pages 14399–14409, 2019.
    Google ScholarLocate open access versionFindings
  • Kwang-Sung Jun, Rebecca Willett, Stephen Wright, and Robert Nowak. Bilinear bandits with low-rank structure. In Proceedings of the 36th International Conference on Machine Learning, pages 3163–3172, 2019.
    Google ScholarLocate open access versionFindings
  • O. Kallenberg. Random Measures, Theory and Applications. Probability Theory and Stochastic Modelling. Springer International Publishing, 2017. ISBN 9783319415987.
    Google ScholarFindings
  • Robert D. Kleinberg, Alexandru Niculescu-Mizil, and Yogeshwer Sharma. Regret bounds for sleeping experts and bandits. In Proceedings of the 21st Annual Conference on Learning Theory, pages 425–436, 2008.
    Google ScholarLocate open access versionFindings
  • Joon Kwon, Vianney Perchet, and Claire Vernade. Sparse stochastic bandits. In Proceedings of the 30th Conference on Learning Theory, pages 1269–1270, 2017.
    Google ScholarLocate open access versionFindings
  • Tze Leung Lai and Herbert Robbins. Asymptotically efficient adaptive allocation rules. Advances in applied mathematics, 6(1):4–22, 1985.
    Google ScholarLocate open access versionFindings
  • Shuai Li and Shengyu Zhang. Online clustering of contextual cascading bandits. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
    Google ScholarLocate open access versionFindings
  • Shuai Li, Alexandros Karatzoglou, and Claudio Gentile. Collaborative filtering bandits. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, pages 539–548, 2016.
    Google ScholarLocate open access versionFindings
  • Shuai Li, Wei Chen, and Kwong-Sak Leung. Improved algorithm on online clustering of bandits. arXiv preprint arXiv:1902.09162, 2019.
    Findings
  • Odalric-Ambrym Maillard and Shie Mannor. Latent bandits. In Proceedings of the 31th International Conference on Machine Learning, pages 136–144, 2014.
    Google ScholarLocate open access versionFindings
  • Jérémie Mary, Romaric Gaudel, and Philippe Preux. Bandits and recommender systems. In International Workshop on Machine Learning, Optimization and Big Data, pages 325–336, 2015.
    Google ScholarLocate open access versionFindings
  • Jonas W Mueller, Vasilis Syrgkanis, and Matt Taddy. Low-rank bandit methods for high-dimensional dynamic pricing. In Advances in Neural Information Processing Systems, pages 15442–15452, 2019.
    Google ScholarLocate open access versionFindings
  • Jungseul Ok, Se-Young Yun, Alexandre Proutiere, and Rami Mochaourab. Collaborative clustering: Sample complexity and efficient algorithms. In International Conference on Algorithmic Learning Theory, pages 288–329, 2017.
    Google ScholarLocate open access versionFindings
  • Martin Raab and Angelika Steger. Balls into Bins - A Simple and Tight Analysis. In Proceedings of the Second International Workshop on Randomization and Approximation Techniques in Computer Science, pages 159–170, 1998.
    Google ScholarLocate open access versionFindings
  • Paul Resnick and Hal R Varian. Recommender systems. Communications of the ACM, 40(3):56–58, 1997.
    Google ScholarLocate open access versionFindings
  • Daniel Russo and Benjamin Van Roy. Satisficing in time-sensitive bandit learning. arXiv preprint arXiv:1803.02855, 2018.
    Findings
  • Joel A Tropp et al. An introduction to matrix concentration inequalities. Foundations and Trends in Machine Learning, 8(1-2):1–230, 2015.
    Google ScholarLocate open access versionFindings
  • Alexandre B Tsybakov. Introduction to nonparametric estimation. Springer Science & Business Media, 2008.
    Google ScholarFindings
  • Se-Young Yun and Alexandre Proutiere. Optimal cluster recovery in the labeled stochastic block model. In Advances in Neural Information Processing Systems, pages 965–973, 2016.
    Google ScholarLocate open access versionFindings
  • Se-Young Yun, Alexandre Proutiere, et al. Streaming, memory limited algorithms for community detection. In Advances in Neural Information Processing Systems, pages 3167–3175, 2014.
    Google ScholarLocate open access versionFindings
作者
Kaito Ariu
Kaito Ariu
Narae Ryu
Narae Ryu
您的评分 :
0

 

标签
评论
小科