AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
Common scenarios that may benefit from this solution are recommender systems that exploit a Reproduced Kernel Hilbert Space for deriving product similarities, and more generally recommendation algorithms that operate in scenarios with many content features

Efficient Context-Aware Sequential Recommender System.

WWW '18: The Web Conference 2018 Lyon France April, 2018, pp.1391-1394, (2018)

Cited by: 0|Views1
EI
Full Text
Bibtex
Weibo

Abstract

Traditional collaborative filtering, and content-based approaches attempt to learn a static recommendation model in a batch fashion. These approaches are not suitable in highly dynamic recommendation scenarios, like news recommendation and computational advertisement. Due to this well-known limitation, in the last decade a lot of efforts ...More

Code:

Data:

0
Introduction
  • The recommender system field has been largely studied since early 90s [18]. Recommendation engines are an essential part of many on-line businesses such as Amazon [20] and Netflix [11].
  • Common scenarios that may benefit from this solution are recommender systems that exploit a Reproduced Kernel Hilbert Space for deriving product similarities [3, 6], and more generally recommendation algorithms that operate in scenarios with many content features [13, 14].
  • The reason is that exploration is necessary to refine the learner estimation even if it may decrease the short-term rewards, while exploitation is necessary in order to trying selecting the optimal arm.
Highlights
  • The recommender system field has been largely studied since early 90s [18]
  • Typical examples of modern recommendation problems are news articles [13] and computational advertisement [14]. Due to their sequential nature and the dependence on the current context, these application domains are not suited to standard batch algorithms that are commonly used in static settings
  • Common scenarios that may benefit from this solution are recommender systems that exploit a Reproduced Kernel Hilbert Space for deriving product similarities [3, 6], and more generally recommendation algorithms that operate in scenarios with many content features [13, 14]
  • We have proposed the adoption of a deterministic sketching technique for compressing the original feature space
  • Future investigations regard the evaluation over real dataset, the study of the theoretical guarantees, and the adoption of size-adaptive sketching techniques as in [17]
Results
  • Contextual MAB learners, which exploit the context as additional information in order to maximize their reward, are sequential decision-making models that extend Robbins’ model and are largely adopted in the Recommender System community [9].
  • The most popular contextual learners [9, 16] assume a linear relation governing the reward in terms of content features, and are based on ridge-regression for the estimation of the hidden vector parameter.
  • The learner observes the current user ut and a set of arms It together with their feature vectors xi,t ∀i ∈ It .
  • Based on the payoffs observed in previous trials [t − 1] with respect to the chosen contexts, the learner chooses the item to be recommended i (t ).
  • It is important to emphasize that in the bandit setting, the only observed reward per round is the one corresponding to the recommended arm, no feedback is given for unchosen arms i i (t ).
  • Coming back to the recommendation problem, the authors can consider item contexts as the arm set.
  • Init: b0 = 0, for t = 1,2,...T do compute θas specified in Equation 4 receive the set of contents It := {xt,1, ..xt,K } select arm it according to Equation 5 observe reward rt associated with arm it Design matrix and vector of rewards updates observed until round t, and rt ∈ Rd the column-vector of corresponding feedbacks.
  • All of them have been used to evaluate the algorithm over T = 5000 trials comparing to the performance obtained by the original linear contextual learner, that is the objective of the approximation.
Conclusion
  • The authors have initiated an investigation of approximation in contextual linear bandit algorithms operating in relevant scenarios where vendors acquire and maintain a large amount of content in their repository, or may be using a Reproduced Kernel Hilbert Space for storing similarities.
  • The authors have proposed the adoption of a deterministic sketching technique for compressing the original feature space.
  • Future investigations regard the evaluation over real dataset, the study of the theoretical guarantees, and the adoption of size-adaptive sketching techniques as in [17].
Summary
  • The recommender system field has been largely studied since early 90s [18]. Recommendation engines are an essential part of many on-line businesses such as Amazon [20] and Netflix [11].
  • Common scenarios that may benefit from this solution are recommender systems that exploit a Reproduced Kernel Hilbert Space for deriving product similarities [3, 6], and more generally recommendation algorithms that operate in scenarios with many content features [13, 14].
  • The reason is that exploration is necessary to refine the learner estimation even if it may decrease the short-term rewards, while exploitation is necessary in order to trying selecting the optimal arm.
  • Contextual MAB learners, which exploit the context as additional information in order to maximize their reward, are sequential decision-making models that extend Robbins’ model and are largely adopted in the Recommender System community [9].
  • The most popular contextual learners [9, 16] assume a linear relation governing the reward in terms of content features, and are based on ridge-regression for the estimation of the hidden vector parameter.
  • The learner observes the current user ut and a set of arms It together with their feature vectors xi,t ∀i ∈ It .
  • Based on the payoffs observed in previous trials [t − 1] with respect to the chosen contexts, the learner chooses the item to be recommended i (t ).
  • It is important to emphasize that in the bandit setting, the only observed reward per round is the one corresponding to the recommended arm, no feedback is given for unchosen arms i i (t ).
  • Coming back to the recommendation problem, the authors can consider item contexts as the arm set.
  • Init: b0 = 0, for t = 1,2,...T do compute θas specified in Equation 4 receive the set of contents It := {xt,1, ..xt,K } select arm it according to Equation 5 observe reward rt associated with arm it Design matrix and vector of rewards updates observed until round t, and rt ∈ Rd the column-vector of corresponding feedbacks.
  • All of them have been used to evaluate the algorithm over T = 5000 trials comparing to the performance obtained by the original linear contextual learner, that is the objective of the approximation.
  • The authors have initiated an investigation of approximation in contextual linear bandit algorithms operating in relevant scenarios where vendors acquire and maintain a large amount of content in their repository, or may be using a Reproduced Kernel Hilbert Space for storing similarities.
  • The authors have proposed the adoption of a deterministic sketching technique for compressing the original feature space.
  • Future investigations regard the evaluation over real dataset, the study of the theoretical guarantees, and the adoption of size-adaptive sketching techniques as in [17].
Tables
  • Table1: Statistic of used datasets
  • Table2: Policies performance measured as regret mean ± regret standard deviation
Download tables as Excel
Related work
  • The main goal of a well-tuned multi armed bandit model (MAB) is to manage the exploration-exploitation dilemma which typically arise in a sequential setting with partial feedback. To act eventually optimally, the learner exploits past observation to select the arm (item) which appears best. On the other hand, such arm could instead be suboptimal, due to lack of knowledge. To avoid selecting always a possible suboptimal arm, the learner has to consider also the option of exploring by selecting a so far unseen arm (e.g., an item whose feature vector is orthogonal to all the vectors selected until current trial) with the objective of gather more information about the new direction. Clearly, naive strategies of pure-exploration or pure-exploitation are far to be effective. The reason is that exploration is necessary to refine the learner estimation even if it may decrease the short-term rewards, while exploitation is necessary in order to trying selecting the optimal arm. This ability allow MAB algorithms to break-down also trendy deep-learning approaches when there is a continuous need of recommendations over colditems. The MAB model was originally proposed and investigated by Robbins [15], attracting the interest of several researchers among different communities. Contextual MAB learners, which exploit the context as additional information in order to maximize their reward, are sequential decision-making models that extend Robbins’ model and are largely adopted in the Recommender System community [9]. [1, 5, 10, 21] try to solve the personalized recommendation problem for fluid (new) users when no content user features are available (cold-user), with the aim of understanding as quickly as possible the class profile the user belongs to. Collaborative strategies [3, 7, 12, 22, 23] try instead to speed-up the learning process by collaboratively sharing information. In particular they strengthen the learning phase by agglomerating similar user profiles in clusters, requiring less rewards per user and less exploration.
Reference
  • Li Zhou; Emma Brunskill. July 2016. Latent contextual bandits and their application to personalized recommendations for new users. In IJCAI’16Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence. 3646–3653.
    Google ScholarLocate open access versionFindings
  • Mattia Bianchi; Federico Cesaro; Filippo Ciceri; Mattia Dagrada; Alberto Gasparin; Daniele Grattarola; Ilyas Inajjar; Alberto Maria Metelli; Leonardo Cella. August 2017. Content-Based approaches for Cold-Start Job Recommendations. In RecSys’17, Proceeding RecSys Challenge ’17 Proceedings of the Recommender Systems Challenge 2017. https://doi.org/10.1145/3124791.3124793
    Locate open access versionFindings
  • Leonardo Cella; Romaric Gaudel; Paolo Cremonesi. August 2017. Kernalized Collaborative Contextual Bandits. In In Proceedings of RecSys 2017 Posters.
    Google ScholarLocate open access versionFindings
  • Leonardo Cella; Stefano Cereda; Massimo Quadrana; Paolo Cremonesi. July 2017. Deriving Item Features Relevance from Past User Interactions. In UMAP’17 Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization. 275–279. https://doi.org/10.1145/3079628.3079695
    Locate open access versionFindings
  • Philippe Preux Cricia Z. Felicio; Klerisson V.R. Paixao; Celia A.Z. Barcelos. July 2017. A Multi-Armed Bandit Model Selection for Cold-Start User Recommendation. In UMAP ’17 Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization. 32–40. https://doi.org/10.1145/3079628.3079681
    Locate open access versionFindings
  • Michal Valko; Nathaniel Korda; Remi Munos; Ilias Flaounas; Nelo Cristianini. August 2013. Finite-Time Analysis of Kernelised Contextual Bandits. In UAI13 Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence. 654–663.
    Google ScholarLocate open access versionFindings
  • Shuai Li; Alexandros Karatzoglou; Claudio Gentile. July 2016. Collaborative Filtering Bandits. In SIGIR ’16 Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval.
    Google ScholarLocate open access versionFindings
  • Haipeng Luo; Alekh Agarwal; Nicolo Cesa-Bianchi; John Langford. December 2016. Efficient Second Order Online Learning by Sketching. In NIPS’16 30th Conference on Neural Information Processing Systems.
    Google ScholarLocate open access versionFindings
  • Lihong Li and Robert E. Schapire Wei Chu, John Langford. April 2010. A contextual-bandit approach to personalized news article recommendation. In WWW’10 Proceedings of the 19th international conference on World wide web. 661–670. https://doi.org/10.1145/1772690.1772758
    Locate open access versionFindings
  • Branislav Kveton; Csaba Szepesvari; Anup Rao; Zheng Wen; Yasin AbbasiYadkori; S. Muthukrishnan. December 2017. Stochastic Low-Rank Bandits. In arXiv:1712.04644.
    Findings
  • James Bennett; Stan Lanning; Netflix Netflix. 2007. The Netflix Prize (2007). In In KDD Cup and Workshop in conjunction with KDD.
    Google ScholarLocate open access versionFindings
  • Giovanni Zappella Nicolo Cesa-Bianchi, Claudio Gentile. December 2013. A Gang of Bandits. In NIPS’13 Advances in Neural Information Processing Systems 26. 737–745.
    Google ScholarLocate open access versionFindings
  • Abhinandan S. Das; Mayur Datar; Ashutosh Garg; Shyam Rajaram. [n. d.]. Google news personalization: scalable online collaborative filtering. In WWW ’07 Proceedings of the 16th international conference on World Wide Web. 271.
    Google ScholarLocate open access versionFindings
  • Aris Anagnostopoulos; Andrei Z. Broder; Evgeniy Gabrilovich; Vanja Josifovski; Lance Riedel. [n. d.]. Just-in-time contextual advertising. In CIKM ’07 Proceedings of the sixteenth ACM conference on Conference on information and knowledge management.
    Google ScholarLocate open access versionFindings
  • Herbert Robbins. 1952. Some aspects of the sequential design of experiments. Vol. 58. Bulletin of the American Mathematical Society, 527–535.
    Google ScholarLocate open access versionFindings
  • Yasin Abbasi Yadkori; David Pal; Csaba Szepesvari. December 2011. Improved algorithms for linear stochastic bandits. In NIPS’11 Proceedings of the 24th International Conference on Neural Information Processing Systems.
    Google ScholarLocate open access versionFindings
  • Daniele Calandriello; Alessandro Lazaric; Michal Valko. 20Second-Order Kernel Online Convex Optimization with Adaptive Sketching. In ICML’17 Proceedings of the 34th International Conference on Machine Learning.
    Google ScholarLocate open access versionFindings
  • Paul Resnick; Hal R. Varian. March 2997. Recommender Systems. Vol. 40. Magazine Communications of the ACM, 56–58. https://doi.org/10.1145/245108.245121
    Locate open access versionFindings
  • Mina Ghashami; Edo Liberty; Jeff M. Phillips; David P. Woodruff. September 2016. Frequent Directions: simple and deterministic matrix sketching. (September 2016), 1762-1792 pages.
    Google ScholarFindings
  • G. Linden; B. Smith; J. York. January 2003. Amazon.com recommendations: item-to-item collaborative filtering. In IEEE Internet Computing. 76 – 80. https://doi.org/10.1109/MIC.2003.1167344
    Locate open access versionFindings
  • Aditya Gopalan; Odalric-Ambrym Maillard; Mohammadi Zaki. September 2016. Low-rank Bandits with Latent Mixtures. In arXiv:1609.01508.
    Findings
  • Claudio Gentile; Shuai Li; Giovanni Zappella. June 2014. Online clustering of bandits. In ICML ’14 Proceedings of the 31st International Conference on International Conference on Machine Learning. 757–765.
    Google ScholarLocate open access versionFindings
  • Claudio Gentile; Shuai Li; Purushottam Kar; Alexandros Karatzoglou; Evans Etrue; Giovanni Zappella. February 2017. On Context-Dependent Clustering of Bandits. In Proceedings of the 34th International Conference on Machine Learning.
    Google ScholarLocate open access versionFindings
Author
Leonardo Cella
Leonardo Cella
Your rating :
0

 

Tags
Comments
小科