Parameter-Efficient Transfer from Sequential Behaviors for User Modeling and Recommendation
SIGIR '20: The 43rd International ACM SIGIR conference on research and development in Information Retrieval Virtual Event China July, 2020, pp. 1469-1478, 2020.
EI
Weibo:
Abstract:
Inductive transfer learning has had a big impact on computer vision and NLP domains but has not been used in the area of recommender systems. Even though there has been a large body of research on generating recommendations based on modeling user-item interaction sequences, few of them attempt to represent and transfer these models for se...More
Code:
Data:
Introduction
- The last 10 years have seen the ever increasing use of social media platforms and e-commerce systems, such as Tiktok, Amazon or Netix.
- Most of the past work has been focused on the task of recommending items on the same platform, from where the data came from.
- Few of these methods exploit this data to learn a universal user representation that could be used for a dierent downstream task, such as for instance the cold-start user problem on a dierent recommendation platform or the prediction of a user prole
Highlights
- The last 10 years have seen the ever increasing use of social media platforms and e-commerce systems, such as Tiktok, Amazon or Netix
- We propose a universal user representational learning architecture, a method that can be used to achieve NLP or computer vision (CV)-like transfer learning for various downstream tasks
- To evaluate the performance of PeterRec in the downstream tasks, we randomly split the target dataset into training (70%), validation (3%) and testing (27%) sets
- PeterRec outperforms MTL in all tasks, which implies that the proposed two-stage pre-training & ne-tuning paradigm is more powerful than the joint training in MTL. We argue this is because the optimal parameters learned for two objectives in MTL does not gurantee optimal performance for ne-tuning
- We have shown that (1) it is possible to learn universal user representations by modeling only unsupervised user sequential behaviors; and (2) it is possible to adapt the learned representations for a variety of downstream tasks
- The core idea of learning-to-learn is that the parameters of deep neural networks can be predicted from another [4, 26]; [6] demonstrated that it is possible to predict more than 95% parameters of a network in a layer given the remaining 5%
- By releasing both high-quality datasets and codes, we hope PeterRec serves as a benchmark for transfer learning in the recommender system domain
Methods
- Having developed the model patch architecture, the question is how to inject it into the current DC.
- ReLU Layer-Norm DC Layer 1×3 ReLU Layer-Norm MP DC Layer 1×3 ReLU Layer-Norm Input DC Layer 1×3.
- (a) original (b) serial √ MP ReLU.
- DC Layer 1×3 ReLU (c) serial √
Results
- To evaluate the performance of PeterRec in the downstream tasks, the authors randomly split the target dataset into training (70%), validation (3%) and testing (27%) sets.
- Note that to speed up the experiments of item recommendation tasks, the authors follow the common strategy in [13] by randomly sampling negative examples for the true example, and evaluate top-5 accuracy among the items.
- To answer RQ1, the authors compare PeterRec in two cases: wellpre-trained and no-pre-trained settings.
- The authors refer to PeterRec with randomly initialized weights as PeterZero
Conclusion
- The authors have shown that (1) it is possible to learn universal user representations by modeling only unsupervised user sequential behaviors; and (2) it is possible to adapt the learned representations for a variety of downstream tasks.
- The authors have evaluated several alternative designs of PeterRec, and made insightful observations by extensive ablation studies
- By releasing both high-quality datasets and codes, the authors hope PeterRec serves as a benchmark for transfer learning in the recommender system domain.
- If the authors have the video watch behaviors of a teenager, the authors may know whether he has depression or propensity for violence by PeterRec without resorting to much feature engineering and human-labeled data.
- The authors may explore PeteRec with more tasks
Summary
Introduction:
The last 10 years have seen the ever increasing use of social media platforms and e-commerce systems, such as Tiktok, Amazon or Netix.- Most of the past work has been focused on the task of recommending items on the same platform, from where the data came from.
- Few of these methods exploit this data to learn a universal user representation that could be used for a dierent downstream task, such as for instance the cold-start user problem on a dierent recommendation platform or the prediction of a user prole
Objectives:
The authors aim to demonstrate how to modify the pre-trained network to obtain better accuracy in related but very dierent tasks by training only few parameters.Methods:
Having developed the model patch architecture, the question is how to inject it into the current DC.- ReLU Layer-Norm DC Layer 1×3 ReLU Layer-Norm MP DC Layer 1×3 ReLU Layer-Norm Input DC Layer 1×3.
- (a) original (b) serial √ MP ReLU.
- DC Layer 1×3 ReLU (c) serial √
Results:
To evaluate the performance of PeterRec in the downstream tasks, the authors randomly split the target dataset into training (70%), validation (3%) and testing (27%) sets.- Note that to speed up the experiments of item recommendation tasks, the authors follow the common strategy in [13] by randomly sampling negative examples for the true example, and evaluate top-5 accuracy among the items.
- To answer RQ1, the authors compare PeterRec in two cases: wellpre-trained and no-pre-trained settings.
- The authors refer to PeterRec with randomly initialized weights as PeterZero
Conclusion:
The authors have shown that (1) it is possible to learn universal user representations by modeling only unsupervised user sequential behaviors; and (2) it is possible to adapt the learned representations for a variety of downstream tasks.- The authors have evaluated several alternative designs of PeterRec, and made insightful observations by extensive ablation studies
- By releasing both high-quality datasets and codes, the authors hope PeterRec serves as a benchmark for transfer learning in the recommender system domain.
- If the authors have the video watch behaviors of a teenager, the authors may know whether he has depression or propensity for violence by PeterRec without resorting to much feature engineering and human-labeled data.
- The authors may explore PeteRec with more tasks
Tables
- Table1: Number of instances. Each instance in S and T represents (u, xu ) and (u, ) pairs, respectively. The number of source items |X |=191K , 645K , 645K , 645K , 645K (K = 1000), and the number of target labels | Y |=20K , 17K , 2, 8, 6 for the ve dataset from left to right in the below table. M = 1000K
- Table2: Impacts of pre-training — FineZero vs. FineAll (with the causal CNN architectures). Without special mention, in the following we only report ColdRec-1 with HR@5 and ColdRec-2 with MRR@5 for demonstration
- Table3: Performance comparison (with the non-causal CNN architectures). The number of ne-tuned parameters ( and ) of PeterRec accounts for 9.4%, 2.7%, 0.16%, 0.16%, 0.16% of FineAll on the ve datasets from left to right
- Table4: Results regarding user prole prediction
- Table5: Top-5 Accuracy in the cold user scenario
- Table6: PeterRecal vs. PeterRecon. The results of the rst and last two columns are ColdRec-1 and AgeEst datasets, respectively
- Table7: Performance of dierent insertions in Figure 3 on AgeEst
Related work
- PeterRec tackles two research questions: (1) training an eective and ecient base model, and (2) transferring the learned user representations from the base model to downstream tasks with a high degree of parameter sharing. Since we choose the sequential recommendation models to perform this upstream task, we briey review related literature. Then we recapitalize work in transfer learning and user representation adaptation.
2.1 Sequential Recommendation Models
A sequential recommendation (SR) model takes in a sequence (session) of user-item interactions, and taking sequentially each item of the sequence as input aims to predict the next one(s) that the user likes. SR have demonstrated obvious accuracy gains compared to traditional content or context-based recommendations when modeling users sequential actions [18]. Another merit of SR is that sequential models do not necessarily require user prole information since user representations can be implicitly reected by their past sequential behaviors. Amongst these models, researchers have paid special attention to three lines of work: RNN-based [14], CNNbased [34, 39, 40], and pure attention-based [18] sequential models. In general, typical RNN models strictly rely on sequential dependencies during training, and thus, cannot take full advantage of modern computing architectures, such as GPUs or TPU [40]. CNN and attention-based recommendation models do not have such a problems since the entire sequence can be observed during training and thus can be fully parallel. One well-known obstacle that prevents CNN from being a strong sequential model is the limited receptive eld due to its small kernel size (e.g., 3 ⇥ 3). This issue has been cleverly approached by introducing the dilated convolutional operation, which enables an exponentially increased receptive eld with unchanged kernel [39, 40]. By contrast, self-attention based sequential models, such as SASRec [18] may have time complexity and memory issues since they grow quadratically with the sequence length. Thereby, we choose dilated convolution-based sequential neural network to build the pre-trained model by investigating both causal (i.e., NextItNet [40]) and non-causal (i.e., the bidirectional encoder of GRec [39]) convolutions in this paper.
Funding
- This work is partly supported by the National Natural Science Foundation of China (61972372, U19A2079)
Reference
- John Anderson, Qingqing Huang, Walid Krichene, Steen Rendle, and Li Zhang. 2020. Superbloom: Bloom lter meets Transformer. arXiv preprint arXiv:2002.04723 (2020).
- Jimmy Lei Ba, Jamie Ryan Kiros, and Georey E Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450 (2016).
- Yoshua Bengio and Samy Bengio. 2000. Modeling high-dimensional discrete data with multi-layer neural networks. In Advances in Neural Information Processing Systems. 400–406.
- Luca Bertinetto, João F Henriques, Jack Valmadre, Philip Torr, and Andrea Vedaldi. 2016. Learning feed-forward one-shot learners. In Advances in Neural Information Processing Systems. 523–531.
- Chong Chen, Min Zhang, Chenyang Wang, Weizhi Ma, Minming Li, Yiqun Liu, and Shaoping Ma. 2019. An Ecient Adaptive Transfer Neural Network for Social-aware Recommendation. (2019).
- Misha Denil, Babak Shakibi, Laurent Dinh, Marc’Aurelio Ranzato, and Nando De Freitas. 2013. Predicting parameters in deep learning. In Advances in neural information processing systems. 2148–2156.
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
- Guibing Guo, Shichang Ouyang, Xiaodong He, Fajie Yuan, and Xiaohua Liu. 2019. Dynamic item block and prediction enhancing block for sequential recommendation. International Joint Conferences on Articial Intelligence Organization.
- Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: a factorization-machine based neural network for CTR prediction. arXiv preprint arXiv:1703.04247 (2017).
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770–778.
- Xiangnan He and Tat-Seng Chua. 2017. Neural factorization machines for sparse predictive analytics. In Proceedings of the 40th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 355–364.
- Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, and Meng Wang. 2020. LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation. Proceedings of the 43th International ACM SIGIR conference on Research and Development in Information Retrieval (2020).
- Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative ltering. In Proceedings of the 26th international conference on world wide web. International World Wide Web Conferences Steering Committee, 173–182.
- Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2015. Session-based recommendations with recurrent neural networks. arXiv preprint arXiv:1511.06939 (2015).
- Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. 2019. Parameter-Ecient Transfer Learning for NLP. arXiv preprint arXiv:1902.00751 (2019).
- Guangneng Hu, Yu Zhang, and Qiang Yang. 2018. Conet: Collaborative cross networks for cross-domain recommendation. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. ACM, 667– 676.
- Guangneng Hu, Yu Zhang, and Qiang Yang. 2018. MTNet: a neural approach for cross-domain recommendation with unstructured text.
- Wang-Cheng Kang and Julian McAuley. 20Self-attentive sequential recommendation. In 2018 IEEE International Conference on Data Mining (ICDM). IEEE, 197–206.
- Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
- Wenqiang Lei, Xiangnan He, Yisong Miao, Qingyun Wu, Richang Hong, Min-Yen Kan, and Tat-Seng Chua. 20Estimation-action-reection: Towards deep interaction between conversational and recommender systems. In Proceedings of the 13th International Conference on Web Search and Data Mining. 304–312.
- Pramod Kaushik Mudrakarta, Mark Sandler, Andrey Zhmoginov, and Andrew Howard. 2018. K For The Price Of 1: Parameter Ecient Multi-task And Transfer Learning. arXiv preprint arXiv:1810.10703 (2018).
- Vinod Nair and Georey E Hinton. 2010. Rectied linear units improve restricted boltzmann machines. In ICML. 807–814.
- Yabo Ni, Dan Ou, Shichen Liu, Xiang Li, Wenwu Ou, Anxiang Zeng, and Luo Si.
- 2018. Perceive your users in depth: Learning universal user representations from multiple e-commerce tasks. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 596–605.
- [24] Shilin Qu, Fajie Yuan, Guibing Guo, Liguang Zhang, and Wei Wei. 2020. CmnRec: Sequential Recommendations with Chunk-accelerated Memory Network. arXiv preprint arXiv:2004.13401 (2020).
- [25] Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. [n.d.].
- [26] Sylvestre-Alvise Rebu, Hakan Bilen, and Andrea Vedaldi. 2017. Learning multiple visual domains with residual adapters. In Advances in Neural Information Processing Systems. 506–516.
- [27] Sylvestre-Alvise Rebu, Hakan Bilen, and Andrea Vedaldi. 2018. Ecient parametrization of multi-domain deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8119–8127.
- [28] Steen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the twenty-fth conference on uncertainty in articial intelligence. AUAI Press, 452–461.
- [29] Steen Rendle, Walid Krichene, Li Zhang, and John Anderson. 2020. Neural Collaborative Filtering vs. Matrix Factorization Revisited. arXiv preprint arXiv:2005.09683 (2020).
- [30] Amir Rosenfeld and John K Tsotsos. 2018. Incremental learning through deep adaptation. IEEE transactions on pattern analysis and machine intelligence (2018).
- [31] Asa Cooper Stickland and Iain Murray. 2019. BERT and PALs: Projected Attention Layers for Ecient Adaptation in Multi-Task Learning. arXiv preprint arXiv:1902.02671 (2019).
- [32] Weijie Su, Xizhou Zhu, Yue Cao, Bin Li, Lewei Lu, Furu Wei, and Jifeng Dai. 2019. Vl-bert: Pre-training of generic visual-linguistic representations. arXiv preprint arXiv:1908.08530 (2019).
- [33] Yang Sun, Fajie Yuan, Ming Yang, Guoao Wei, Zhou Zhao, and Duo Liu. 2020. A Generic Network Compression Framework for Sequential Recommender Systems. Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining (2020).
- [34] Jiaxi Tang and Ke Wang. 2018. Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding. In ACM International Conference on Web Search and Data Mining.
- [35] Jingyi Wang, Qiang Liu, Zhaocheng Liu, and Shu Wu. 2019. Towards Accurate and Interpretable Sequential Prediction: A CNN & Attention-Based Feature Extractor. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. ACM, 1703–1712.
- [36] Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. 2017. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1492–1500.
- [37] Jason Yosinski, Je Clune, Yoshua Bengio, and Hod Lipson. 2014. How transferable are features in deep neural networks?. In Advances in neural information processing systems. 3320–3328.
- [38] Fajie Yuan, Guibing Guo, Joemon M Jose, Long Chen, Haitao Yu, and Weinan Zhang. 2016. Lambdafm: learning optimal ranking with factorization machines using lambda surrogates. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM, 227–236.
- [39] Fajie Yuan, Xiangnan He, Haochuan Jiang, Guibing Guo, Jian Xiong, Zhezhao Xu, and Yilin Xiong. 2020. Future Data Helps Training: Modeling Future Contexts for Session-based Recommendation. In Proceedings of The Web Conference 2020. 303–313.
- [40] Fajie Yuan, Alexandros Karatzoglou, Ioannis Arapakis, Joemon M Jose, and Xiangnan He. 2019. A Simple Convolutional Generative Network for Next Item Recommendation. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. ACM, 582–590.
- [41] Fajie Yuan, Xin Xin, Xiangnan He, Guibing Guo, Weinan Zhang, Chua Tat-Seng, and Joemon M Jose. 2018. fBGD: Learning embeddings from positive unlabeled data with BGD. (2018).
- [42] Feng Yuan, Lina Yao, and Boualem Benatallah. 2019. DARec: Deep Domain Adaptation for Cross-Domain Recommendation via Transferring Rating Patterns. arXiv preprint arXiv:1905.10760 (2019).
- [43] Kui Zhao, Yuechuan Li, Zhaoqian Shuai, and Cheng Yang. 2018. Learning and Transferring IDs Representation in E-commerce. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 1031–1039.
Full Text
Tags
Comments