Personalized Adaptive Meta Learning for Cold-start User Preference Prediction

Cited by: 0|Bibtex|Views11
Other Links: arxiv.org
Weibo:
Experiments on MovieLens, BookCrossing, and real-world production datasets reveal that our method outperforms the state-of-the-art methods dramatically for both the minor and major users

Abstract:

A common challenge in personalized user preference prediction is the cold-start problem. Due to the lack of user-item interactions, directly learning from the new users' log data causes serious over-fitting problem. Recently, many existing studies regard the cold-start personalized preference prediction as a few-shot learning problem, w...More

Code:

Data:

0
Introduction
  • Recommender Systems (RS) help people to discover the items they prefer (Guo et al 2017; Qu et al 2016).
  • In order to train a well-performing personalized user preference predictor, enough interactions with users are indispensable
  • To address this challenge, many researchers take advantage of the offline supervised training methods, which leverage the historical data to train the model.
  • To train a well-performing model in the cold-start problem, meta learning-based approaches are introduced (Lee et al 2019; Dong et al 2020).
  • In the RS area, meta learning is introduced for the cold-start problem for either users or items, which treats the users/items as tasks, log data as samples, and learns to do fast adaptation when meeting new tasks (Dong et al 2020; Pan et al 2019; Lee et al 2019; Luo et al 2020)
Highlights
  • Recommender Systems (RS) help people to discover the items they prefer (Guo et al 2017; Qu et al 2016)
  • Experiments on MovieLens, BookCrossing, and real-world production datasets reveal that our method outperforms the state-of-the-art methods dramatically for both the minor and major users
  • In this paper, we propose a personalized adaptive meta learning approach to improve the performance of MAML in the cold-start user preference prediction problem
  • Both the AT-Personalized Adaptive Meta Learning (PAML) and REG-PAML have lower third quartile comparing with SOTAs, showing that our methods can achieve good-enough performance for most of the cold-start users
  • We propose a novel personalized adaptive metalearning method to address the user overfitting problem in cold-start user preference prediction challenge with three key contributions: 1) We are the first to introduce a personalized adaptive learning rate based meta learning approach to improve the performance of MAML by focusing on both the major and minor users
Methods
  • 4.1 Adaptive Learning Rate based MAML.
  • MAML is that the LR in the method is a mapping from the user embedding to a real number rather than a fixed LR.
  • With an adaptive learning rate, the meta agent can fit any user even if it is far from the meta strategy.
  • An analysis is given to illustrate the adaptive learning rate can get better results in user imbalanced dataset: Lemma 2.
Results
  • In Fig. 5, Meta-SGD has some outliners with large values, which might be the reason that Meta-SGD does not perform well in the BookCrossing dataset.
  • Both the AT-PAML and REG-PAML have lower third quartile comparing with SOTAs, showing that the methods can achieve good-enough performance for most of the cold-start users
Conclusion
  • The authors propose a novel personalized adaptive metalearning method to address the user overfitting problem in cold-start user preference prediction challenge with three key contributions: 1) The authors are the first to introduce a personalized adaptive learning rate based meta learning approach to improve the performance of MAML by focusing on both the major and minor users.
  • 3) To reduce the memory usage, the authors propose a memory agnostic regularizer to reduce the space complexity and maintain the memorizing ability by learning.
  • Experiments on MovieLens-1M, BookCrossing, and realworld production dataset reveal that the method outperforms the state-of-the-art methods dramatically for both the minor users and the major users
Summary
  • Introduction:

    Recommender Systems (RS) help people to discover the items they prefer (Guo et al 2017; Qu et al 2016).
  • In order to train a well-performing personalized user preference predictor, enough interactions with users are indispensable
  • To address this challenge, many researchers take advantage of the offline supervised training methods, which leverage the historical data to train the model.
  • To train a well-performing model in the cold-start problem, meta learning-based approaches are introduced (Lee et al 2019; Dong et al 2020).
  • In the RS area, meta learning is introduced for the cold-start problem for either users or items, which treats the users/items as tasks, log data as samples, and learns to do fast adaptation when meeting new tasks (Dong et al 2020; Pan et al 2019; Lee et al 2019; Luo et al 2020)
  • Objectives:

    Speaking, when a new user ui with embedding hi ∈ H comes, the goal is to find the users with the most similar embeddings to ui.
  • Speaking, when a new user ui with embedding hi ∈ H comes, the goal is to find the users with the most similar interest to ui
  • Methods:

    4.1 Adaptive Learning Rate based MAML.
  • MAML is that the LR in the method is a mapping from the user embedding to a real number rather than a fixed LR.
  • With an adaptive learning rate, the meta agent can fit any user even if it is far from the meta strategy.
  • An analysis is given to illustrate the adaptive learning rate can get better results in user imbalanced dataset: Lemma 2.
  • Results:

    In Fig. 5, Meta-SGD has some outliners with large values, which might be the reason that Meta-SGD does not perform well in the BookCrossing dataset.
  • Both the AT-PAML and REG-PAML have lower third quartile comparing with SOTAs, showing that the methods can achieve good-enough performance for most of the cold-start users
  • Conclusion:

    The authors propose a novel personalized adaptive metalearning method to address the user overfitting problem in cold-start user preference prediction challenge with three key contributions: 1) The authors are the first to introduce a personalized adaptive learning rate based meta learning approach to improve the performance of MAML by focusing on both the major and minor users.
  • 3) To reduce the memory usage, the authors propose a memory agnostic regularizer to reduce the space complexity and maintain the memorizing ability by learning.
  • Experiments on MovieLens-1M, BookCrossing, and realworld production dataset reveal that the method outperforms the state-of-the-art methods dramatically for both the minor users and the major users
Tables
  • Table1: Table 1
  • Table2: Comparison of different methods on the MovieLens and the BookCrossing datasets. The best results are highlighted in bold and the second-best results are in italic. Avg means average. The mean and standard deviation are reported by 3 independent trials. ∗ denotes statistically significant improvement over the best baseline method (measured by t-test with p-value< 0.05)
  • Table3: Ablations of different methods on MovieLens-1M. The best results are highlighted in bold and the second-best results are in italic. Avg means average. The results are reported by 3 independent trials. ∗ denotes statistically significant improvement over PAML (measured by t-test with p-value<0.05)
  • Table4: Results of different MSEs for MAjor Users (MAU) and Minor Users (MU) on MovieLens. The best results are highlighted in bold and the second-best results are in italic. P-value is drawn by two-tailed student’s t-test between the minor user MSEs and major user MSEs over the same methods
  • Table5: Ablations of different methods for MAjor Users (MAU) and Minor Users (MU) on MovieLens. The best results are highlighted in bold and the second-best results are in italic. Avg means average. P-value is drawn by two-tailed student’s t-test between the minor user MSEs and major user MSEs over the same methods
  • Table6: Statistics of MovieLens-1M, BookCrossing, and Taobao dataset
  • Table7: Hyper-parameters for PAML, AT-PAML and REG-PAML. d is 5 for Movielens, 10 for BookCrossing, and 2 for production dataset
Download tables as Excel
Related work
  • In this section, we discuss some related works including gradient-based meta learning for the imbalanced dataset and the meta learning for cold-start recommendation..

    Gradient-Based Meta Learning for Task Overfitting Problem. MAML based methods have been widely adopted for studying the few-shot learning problem (Finn, Abbeel, and Levine 2017; Li et al 2017; Xu, van Hasselt, and Silver 2018; Chen et al 2018; Ravi and Larochelle 2016; Lee and Choi 2018). To consider task-adaptive challenges, the vector of learning rates, the block-diagonal preconditioning matrix, latent embedding, and interleaving warp-layers are designed

    The ratio of features Age Gender Zipcode Occup

    The major users Ratio MSE

    The minor users Ratio MSE Table 1: The ratio (

    the number of users having certain feature total number of users values )

    of the top 30% largest number of feature values that users own2, ratio of certain users the number of users in certain total number of users group ), and the
Funding
  • This work was supported by Alibaba Group through Alibaba Innovative Research (AIR) Program and Alibaba-NTU Joint Research Institute (JRI), Nanyang Technological University, Singapore. The authors would like to thank Suming Yu, Zhenyu Shi, Rundong Wang, Xinrun Wang, Feifei Lin, Aye Phyu Phyu Aung, Hanzao Chen, Ziwen Jiang, Yi Cao, Yufei Feng for their helps
Study subjects and analysis
cold-start users: 4
To illustrate the problem clearly, we regard a collection of users with similar feature values as a group and define the users as the major users when the number of users in these groups are large, and others users are the minor users. Now, we give an example to illustrate how imbalanced distribution harms the performance of MAML: as shown in Fig. 1, assuming that an MAML strategy aims to learn to do fast adaptation to the four cold-start users (three major users (users 1-3) and one minor user (user 4)). Different locations in the blue square indicate feature values (embeddings) for different users

Reference
  • Aljundi, R.; Babiloni, F.; Elhoseiny, M.; Rohrbach, M.; and Tuytelaars, T. 2018. Memory aware synapses: Learning what (not) to forget. In ECCV, 139–154.
    Google ScholarLocate open access versionFindings
  • Bharadhwaj, H. 2019. Meta-Learning for User Cold-Start Recommendation. In IJCNN, 1–8.
    Google ScholarLocate open access versionFindings
  • Chen, F.; Luo, M.; Dong, Z.; Li, Z.; and He, X. 2018. Federated meta-learning with fast convergence and efficient communication. arXiv preprint arXiv:1802.07876.
    Findings
  • Dong, M.; Yuan, F.; Yao, L.; Xu, X.; and Zhu, L. 2020. MAMO: memory-augmented meta-optimization for coldstart recommendation. In KDD, 688–697.
    Google ScholarLocate open access versionFindings
  • Finn, C.; Abbeel, P.; and Levine, S. 2017. Model-agnostic meta-learning for fast adaptation of deep networks. In ICML, 1126–1135.
    Google ScholarLocate open access versionFindings
  • Finn, C.; Yu, T.; Zhang, T.; Abbeel, P.; and Levine, S. 2017. One-Shot Visual Imitation Learning via Meta-Learning. In CORL, 357–368.
    Google ScholarLocate open access versionFindings
  • Flennerhag, S.; Rusu, A. A.; Pascanu, R.; Yin, H.; and Hadsell, R. 2020. Meta-Learning with Warped Gradient Descent. In ICLR.
    Google ScholarFindings
  • Friedman, J. H.; Bentley, J. L.; and Finkel, R. A. 1976. An algorithm for finding best matches in logarithmic time. TOMS 3: 209–226.
    Google ScholarLocate open access versionFindings
  • Gretton, A.; Borgwardt, K. M.; Rasch, M. J.; Scholkopf, B.; and Smola, A. 2012. A kernel two-sample test. JMLR 13(Mar): 723–773.
    Google ScholarLocate open access versionFindings
  • Gui, L.-Y.; Wang, Y.-X.; Ramanan, D.; and Moura, J. M. 2018. Few-shot human motion prediction via meta-learning. In ECCV, 432–450.
    Google ScholarLocate open access versionFindings
  • Guo, H.; Tang, R.; Ye, Y.; Li, Z.; and He, X. 2017. DeepFM: A factorization-machine based neural network for CTR prediction. In AAAI, 1725–1731.
    Google ScholarFindings
  • Guo, Q.; Li, Z.; An, B.; Hui, P.; Huang, J.; Zhang, L.; and Zhao, M. 2019. Securing the deep fraud detector in largescale e-commerce platform via adversarial machine learning approach. In WWW, 616–626.
    Google ScholarLocate open access versionFindings
  • Harper, F. M.; and Konstan, J. A. 2015. The movielens datasets: History and context. TIIS 5(4): 1–19.
    Google ScholarLocate open access versionFindings
  • Kingma, D. P.; and Ba, J. 20Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
    Findings
  • Kirkpatrick, J.; Pascanu, R.; Rabinowitz, N.; Veness, J.; Desjardins, G.; Rusu, A. A.; Milan, K.; Quan, J.; Ramalho, T.; Grabska-Barwinska, A.; et al. 2017. Overcoming catastrophic forgetting in neural networks. PNAS 114(13): 3521–3526.
    Google ScholarLocate open access versionFindings
  • Lee, H.; Im, J.; Jang, S.; Cho, H.; and Chung, S. 2019. MeLU: Meta-Learned user preference estimator for cold-Start recommendation. In KDD, 1073–1082.
    Google ScholarFindings
  • Lee, Y.; and Choi, S. 2018. Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace. In ICML, 2933–2942.
    Google ScholarLocate open access versionFindings
  • Li, Z.; Zhou, F.; Chen, F.; and Li, H. 2017. Meta-SGD: Learning to learn quickly for few-shot learning. arXiv preprint arXiv:1707.09835.
    Findings
  • Lin, X.; Chen, H.; Pei, C.; Sun, F.; Xiao, X.; Sun, H.; Zhang, Y.; Ou, W.; and Jiang, P. 20A pareto-efficient algorithm for multiple objective optimization in e-commerce recommendation. In RecSys, 20–28.
    Google ScholarLocate open access versionFindings
  • Luo, M.; Chen, F.; Cheng, P.; Dong, Z.; He, X.; Feng, J.; and Li, Z. 20MetaSelector: meta-learning for recommendation with user-level adaptive model selection. In WWW, 2507–2513.
    Google ScholarFindings
  • Maaten, L. v. d.; and Hinton, G. 2008. Visualizing data using t-SNE. JMLR 9: 2579–2605.
    Google ScholarLocate open access versionFindings
  • Madotto, A.; Lin, Z.; Wu, C.-S.; and Fung, P. 2019. Personalizing dialogue agents via meta-learning. In ACL, 5454–5459.
    Google ScholarFindings
  • Muja, M.; and Lowe, D. 2013. FLANN-Fast Library for Approximate Nearest Neighbors User Manual. University of British Columbia.
    Google ScholarFindings
  • Muja, M.; and Lowe, D. G. 2014. Scalable nearest neighbor algorithms for high dimensional data. PAMI 36(11): 2227– 2240.
    Google ScholarLocate open access versionFindings
  • Pan, F.; Li, S.; Ao, X.; Tang, P.; and He, Q. 2019. Warm Up Cold-start Advertisements: Improving CTR predictions via learning to learn ID embeddings. In SIGIR, 695–704.
    Google ScholarLocate open access versionFindings
  • Park, E.; and Oliva, J. B. 2019. Meta-curvature. In NeurIPS, 3309–3319.
    Google ScholarLocate open access versionFindings
  • Pennington, J.; Socher, R.; and Manning, C. D. 2014. Glove: Global vectors for word representation. In EMNLP, 1532– 1543.
    Google ScholarLocate open access versionFindings
  • Qu, Y.; Cai, H.; Ren, K.; Zhang, W.; Yu, Y.; Wen, Y.; and Wang, J. 2016. Product-based neural networks for user response prediction. In ICDM, 1149–1154.
    Google ScholarFindings
  • Rajeswaran, A.; Finn, C.; Kakade, S. M.; and Levine, S. 2019. Meta-learning with implicit gradients. In NeurIPS, 113–124.
    Google ScholarLocate open access versionFindings
  • Ravi, S.; and Larochelle, H. 2016. Optimization as a model for few-shot learning. In ICLR.
    Google ScholarFindings
  • Ren, K.; Qin, J.; Fang, Y.; Zhang, W.; Zheng, L.; Bian, W.; Zhou, G.; Xu, J.; Yu, Y.; Zhu, X.; et al. 2019. Lifelong Sequential Modeling with Personalized Memorization for User Response Prediction. In SIGIR, 565–574.
    Google ScholarLocate open access versionFindings
  • Rusu, A. A.; Rao, D.; Sygnowski, J.; Vinyals, O.; Pascanu, R.; Osindero, S.; and Hadsell, R. 2018. Meta-Learning with Latent Embedding Optimization. In ICLR.
    Google ScholarFindings
  • Scholkopf, B.; Smola, A. J.; Bach, F.; et al. 2002. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press.
    Google ScholarFindings
  • Tan, C.; Sun, F.; Kong, T.; Zhang, W.; Yang, C.; and Liu, C. 2018. A survey on deep transfer learning. In ICANN, 270–279.
    Google ScholarLocate open access versionFindings
  • Vanschoren, J. 2018. Meta-learning: A survey. arXiv preprint arXiv:1810.03548.
    Findings
  • Vartak, M.; Thiagarajan, A.; Miranda, C.; Bratman, J.; and Larochelle, H. 2017. A meta-learning perspective on coldstart recommendations for items. In NeurIPS, 6904–6914.
    Google ScholarFindings
  • Xu, Z.; van Hasselt, H. P.; and Silver, D. 2018. Meta-gradient reinforcement learning. In NeurIPS, 2396–2407.
    Google ScholarLocate open access versionFindings
  • Yianilos, P. N. 1993. Data structures and algorithms for nearest neighbor search in general metric spaces. In SODA, volume 93, 311–321.
    Google ScholarLocate open access versionFindings
  • Zhao, M.; Li, Z.; An, B.; Lu, H.; Yang, Y.; and Chu, C. 2018. Impression Allocation for Combating Fraud in E-commerce Via Deep Reinforcement Learning with Action Norm Penalty. In IJCAI, 3940–3946.
    Google ScholarFindings
  • Ziegler, C.-N.; McNee, S. M.; Konstan, J. A.; and Lausen, G. 2005. Improving recommendation lists through topic diversification. In WWW, 22–32.
    Google ScholarLocate open access versionFindings
  • Remark 1. REG-PAML can be explained as an approximated proximal regularization. Recall the optimizer in (Rajeswaran et al. 2019): Li(θi)
    Google ScholarLocate open access versionFindings
  • This section we discuss how kernel-based method can calculate similarity. Formally speaking, when a new user ui with embedding hi ∈ H comes, our goal is to find the users with the most similar interest to ui. We first define a (oracle) classifier as g(h): H → R, which maps the embedding to a certain group (e.g., the fishing enthusiast group). That is, the classifier g(h) gives the same output for the users with similar interest. Then, the model can find which users are similar to the new user ui. However, finding this classifier is not an easy task. Inspired by the Maximum Mean Discrepancy (MMD) approach (Gretton et al. 2012), which converts the problem of finding the classifier into calculating the distance of probability, we build the objective as: s:= infj∈U || supg∈G l g(hli) − g(hlj )||2, (3)
    Google ScholarLocate open access versionFindings
  • where G = {g: ||g||H ≤ 1} is the set of classified functions (|| · ||H is the norm function in Hilbert space H). hl is the value for l-th basis (h = (hl)M l=1 ∈ RM, M is the embedding size). || supg∈G l g(hli) − g(hlj)|| is exactly the MMD (Gretton et al. 2012), revealing the disparity between two distributions. In our setting, Eq. (3) can be explained as finding users with similar interest to the user i in the embedding space if we regard each embedding component as random variable.
    Google ScholarLocate open access versionFindings
  • With Riesz’s representation theorem (Scholkopf et al. 2002), we have g(h) = g, φ(h), where φ(h) is the feature space map from H to H. Applying similar approach from (Gretton et al. 2012), we obtain: s = infj∈U || supg∈G g, φ(hi) − φ(hj) || = infj∈U ||φ(hi) − φ(hj)||H, where
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments