Modeling User Exposure in Recommendation
Proceedings of the 25th International Conference on World Wide Web, (2016): 951-961
Collaborative filtering analyzes user preferences for items (e.g., books, movies, restaurants, academic papers) by exploiting the similarity patterns across users. In implicit feedback settings, all the items, including the ones that a user did not consume, are taken into consideration. But this assumption does not accord with the common ...更多
下载 PDF 全文
- Making good recommendations is an important problem on the web. In the recommendation problem, the authors observe how a set of users interacts with a set of items, and the goal is to show each user a set of previously unseen items that she will like.
- Users rate some items and the authors aim to predict their missing ratings.
- Matrix factorization models, which infer user preferences and item attributes by factorizing the click matrix, are standard in recommender systems .
- Gaussian matrix factorization is : θu ∼ N (0, λ−θ 1IK ) βi ∼ N (0, λ−β 1IK ) yui ∼ N, where θu and βi represent user u’s latent preferences and item i’s attributes respectively.
- Making good recommendations is an important problem on the web
- The extended graphical model with exposure covariates is shown in Figure 1b. Whatever this exposure model looks like, conditional independence between the priors for exposure and the more standard collaborative filtering parameters ensures that the updates for the model we introduced in Section 3.1 will be the same for many popular inference procedures, making the extension to exposure covariates a plug-in procedure
- Further when using exposure matrix factorization (ExpoMF) with exposure covariates we found that performance was improved by predicting missing preferences according to E[yui|θu, βi]
- Studying each model in its respective domain we demonstrate that the exposure covariates improve the quality of the recommendations compared to ExpoMF with per-item μi
- We presented a novel collaborative filtering mechanism that takes into account user exposure to items
- In empirical studies we found that the additional flexibility of our model helps it outperform existing approaches to matrix factorization on four datasets from various domains
- For each dataset the authors randomly split the observed user-item interactions into training/test/validation sets with 70/20/10 proportions.
- The model is trained following the inference algorithm described in Section 3.3.
- Hyper-parameters for ExpoMF-based models and baseline models are selected according to the same criterion.
- The authors exclude items from the training and validation sets and calculate all the metrics based on the resulting ordered list.
- Further when using ExpoMF with exposure covariates the authors found that performance was improved by predicting missing preferences according to E[yui|θu, βi]
- The authors presented a novel collaborative filtering mechanism that takes into account user exposure to items.
- The authors theoretically justify existing approaches that downweight unclicked items for recommendation, and provide an extendable framework for specifying more elaborate models of exposure based on logistic regression.
- In empirical studies the authors found that the additional flexibility of the model helps it outperform existing approaches to matrix factorization on four datasets from various domains.
- The authors seek new ways to capture exposure that include ever more realistic assumptions about how users interact with items
- Table1: Attributes of datasets after pre-processing. Interactions are non-zero entries (listening counts, clicks, and checkins). % interactions refers to the density of the useritem consumption matrix (Y )
- Table2: Comparison between WMF [<a class="ref-link" id="c8" href="#r8">8</a>] and ExpoMF. While the differences in performance are generally small, ExpoMF performs comparably better than WMF across datasets
- Table3: Comparison between Content ExpoMF and ExpoMF on Mendeley. We also compare collaborative topic regression (CTR) [<a class="ref-link" id="c30" href="#r30">30</a>], a model makes use of the same additional information as Content ExpoMF
- Table4: Comparison between Location ExpoMF and ExpoMF with per-item μi on Gowalla. Using location exposure covariates outperforms the simpler ExpoMF and WMF according to all metrics
- In this section we highlight connections between ExpoMF and other similar research directions.
Causal inference. Our work borrows ideas from the field of causal inference [22, 9]. Causal inference aims at understanding and explaining the effect of one variable on another.
One particular aim of causal inference is to answer counterfactual questions. For example, “would this new recommendation engine increase user click through rate?”. While online studies may answer such a question, they are typically expensive even for large electronic commerce companies. Obtaining answers to such questions using observa-
- This work is supported by IIS-1247664, ONR N00014-11-10651, DARPA FA8750-14-2-0009, Facebook, Adobe, Amazon, and the John Templeton Foundation
We highlight that:. • ExpoMF performs comparably better than the stateof-the-art WMF  on four datasets representing user clicks, checkins, bookmarks and listening behavior. • When augmenting ExpoMF with exposure covariates its performance is further improved
We binarize the play counts and interpret them as implicit preference. We further pre-process the dataset by only keeping the users with at least 20 songs in their listening history and songs that are listened to by at least 50 users. • ArXiv: contains user-paper clicks derived from log data collected in 2012 by the arXiv pre-print server
• Mendeley: contains user-paper bookmarks as provided by the Mendeley service, a “reference manager”. The behavior data is filtered such that each user has at least 10 papers in her library and the papers that are bookmarked by at least 20 users are kept. In addition
Empirical evaluation. Results comparing ExpoMF to WMF on our four datasets are given in Table 2. Each metric is averaged across all the users
Our model, Content ExpoMF, is trained following the algorithm in Algorithm 1. For updating exposure-related model parameters ψu and γu, we take mini-batch gradient steps with a batch-size of 10 users and a constant step size of 0.5 for 10 epochs. Study
We note that CTR’s performance falls in-between the performance of ExpoMF and WMF (from Table 1). CTR is particularly well suited to the cold-start case which is not the data regime we focus on in this study (i.e., recall that we have only kept papers that have been bookmarked by at least 20 users). Figure 3 highlights the behavior of Content ExpoMF compared to that of regular ExpoMF
In doing so, we theoretically justify existing approaches that downweight unclicked items for recommendation, and provide an extendable framework for specifying more elaborate models of exposure based on logistic regression. In empirical studies we found that the additional flexibility of our model helps it outperform existing approaches to matrix factorization on four datasets from various domains. We note that the same approach can also be used to analyze explicit feedback
- T. Bertin-Mahieux, D. P. W. Ellis, B. Whitman, and P. Lamere. The million song dataset. In Proceedings of the 12th International Society for Music Information Retrieval Conference, pages 591–596, 2011.
- D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet allocation. the Journal of Machine Learning Research, 3:993–1022, 2003.
- L. Bottou, J. Peters, J. Quinonero Candela, D. X. Charles, D. M. Chickering, E. Portugaly, D. Ray, P. Simard, and E. Snelson. Counterfactual reasoning and learning systems: The example of computational advertising. Journal of Machine Learning Research, 14: 3207–3260, 2013.
- E. Cho, S. A. Myers, and J. Leskovec. Friendship and mobility: user movement in location-based social networks. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1082–1090. ACM, 2011.
- A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), pages 1–38, 1977.
- A. Gelman and J. Hill. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press, 2006.
- P. K. Gopalan, L. Charlin, and D. Blei. Contentbased recommendations with Poisson factorization. In Advances in Neural Information Processing Systems, pages 3176–3184, 2014.
- Y. Hu, Y. Koren, and C. Volinsky. Collaborative filtering for implicit feedback datasets. In Data Mining, 200ICDM’08. Eighth IEEE International Conference on, pages 263–272. IEEE, 2008.
- G. W. Imbens and D. B. Rubin. Causal Inference in Statistics, Social, and Biomedical Sciences. Cambridge University Press, 2015.
- H. Ishwaran and J. S. Rao. Spike and slab variable selection: frequentist and Bayesian strategies. Annals of Statistics, pages 730–773, 2005.
- Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. Computer, 42(8): 30–37, Aug. 2009. ISSN 0018-9162.
- D. Lambert. Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics, 34(1):1–14, 1992.
- L. Li, W. Chu, J. Langford, and R. E. Schapire. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th International Conference on World Wide Web, WWW ’10, pages 661–670, New York, NY, USA, 2010.
- G. Ling, H. Yang, M. R. Lyu, and I. King. Response aware model-based collaborative filtering. In Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence, Catalina Island, CA, USA, August 14-18, 2012, pages 501–510, 2012.
- R. J. A. Little and D. B. Rubin. Statistical Analysis with Missing Data. John Wiley & Sons, Inc., New York, NY, USA, 1986. ISBN 0-471-80254-9.
- B. M. Marlin, R. S. Zemel, S. T. Roweis, and M. Slaney. Collaborative filtering and the missing at random assumption. In UAI 2007, Proceedings of the TwentyThird Conference on Uncertainty in Artificial Intelligence, Vancouver, BC, Canada, July 19-22, 2007, pages 267–275, 2007.
- T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, pages 3111–3119, 2013.
- A. Mnih and R. Salakhutdinov. Probabilistic matrix factorization. In Advances in Neural Information Processing Systems, pages 1257–1264, 2007.
- R. M. Neal and G. E. Hinton. A view of the EM algorithm that justifies incremental, sparse, and other variants. In Learning in Graphical Models, pages 355–368.
- R. Pan, Y. Zhou, B. Cao, N. N. Liu, R. Lukose, M. Scholz, and Q. Yang. One-class collaborative filtering. In Data Mining, 2008. ICDM’08. Eighth IEEE International Conference on, pages 502–511. IEEE, 2008.
- U. Paquet and N. Koenigstein. One-class collaborative filtering with random graphs. In Proceedings of the 22nd international conference on World Wide Web, pages 999–1008, 2013.
- J. Pearl. Causality: Models, Reasoning and Inference. Cambridge University Press, New York, NY, USA, 2nd edition, 2009. ISBN 052189560X, 9780521895606.
- S. Rendle. Factorization machines. In Proceedings of the 2010 IEEE International Conference on Data Mining, pages 995–1000, 2010.
- S. Rendle and C. Freudenthaler. Improving pairwise learning for item recommendation from implicit feedback. In Proceedings of the 7th ACM International Conference on Web Search and Data Mining, pages 273– 282. ACM, 2014.
- S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-Thieme. BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pages 452–461, 2009.
- D. Rubin. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66(5):688–701, 1974.
- A. F. Smith and G. O. Roberts. Bayesian computation via the Gibbs sampler and related Markov chain Monte Carlo methods. Journal of the Royal Statistical Society. Series B (Methodological), pages 3–23, 1993.
- A. Swaminathan and T. Joachims. Counterfactual risk minimization. In Proceedings of the 24th International Conference on World Wide Web, WWW ’15 Companion, pages 939–941, 2015.
- M. J. Wainwright and M. I. Jordan. Graphical models, exponential families, and variational inference. Foundations and Trends R in Machine Learning, 1(1-2):1–305, 2008.
- C. Wang and D. Blei. Collaborative topic modeling for recommending scientific articles. In Knowledge Discovery and Data Mining, 2011.
- J. Weston, S. Bengio, and N. Usunier. WSABIE: Scaling up to large vocabulary image annotation. In Proceedings of the International Joint Conference on Artificial Intelligence, IJCAI, pages 2764–2770, 2011.