# A Generic Network Compression Framework for Sequential Recommender Systems

SIGIR '20: The 43rd International ACM SIGIR conference on research and development in Information Retrieval Virtual Event China July, 2020, pp. 1299-1308, 2020.

EI

Weibo:

Abstract:

Sequential recommender systems (SRS) have become the key technology in capturing user's dynamic interests and generating high-quality recommendations. Current state-of-the-art sequential recommender models are typically based on a sandwich-structured deep neural network, where one or more middle (hidden) layers are placed between the inpu...More

Code:

Data:

Introduction

- Sequential (a.k.a. session-based) recommender systems (SRS) have become a research hotspot in the recommendation eld. is is because user interaction behaviors in real-life scenarios o en exist in a form of chronological sequences.
- Is is because user interaction behaviors in real-life scenarios o en exist in a form of chronological sequences
- In such scenarios, traditional RS based on collaborative ltering [25] or content features [21] fail to model user’s dynamic interests and o er only sub-optimal performance.
- As shown in Figure 1, the prediction accuracy of the sequential recommender model NextItNet [42] can be largely improved by increasing its model size, i.e., using a larger embedding dimension or a deeper network architecture.
- NextItNet obtains more than 20% accuracy gains by increasing d from 64 to 512, along with about 3 times larger parameters

Highlights

- Sequential (a.k.a. session-based) recommender systems (SRS) have become a research hotspot in the recommendation eld. is is because user interaction behaviors in real-life scenarios o en exist in a form of chronological sequences
- Sequential recommender models based on recurrent neural networks (RNN) [14, 23] or convolutional neural network (CNN) (o en with dilated kernels) [42] have obtained state-of-the-art performance since these models are more powerful in capturing sequential dependencies in the interaction sequence
- A well-known observation is that frequencies of items generally obey a long-tailed distribution [33, 42, 42], where some “head” items have a large number of user interactions, yet only a few interactions are available for the “tail” items
- We have proposed CpRec, a exible & generic neural network compression framework for learning compact sequential recommender models
- An important conclusion made from these results is that the commonly used recommender models are not compact at all
- We expect CpRec to be valuable for existing Sequential recommender systems (SRS) based on deep neural networks

Methods

- The authors present two main model compression techniques to improve the parameter e ciency of SRS.
- An obvious di culty is that if the authors set adaptive (i.e.,variable-sized) embeddings to items, they cannot be directly trained by the typical sequential recommender model due to inconsistent dimensions of middle layers.
- To this end, the authors perform dimension transformation by multiplying a projection matrix.
- The transformation process is equivalent to a low-rank factorization given that the original large embedding matrix is reconstructed by two smaller matrices

Results

- In order to evaluate the recommendation accuracy of CpRec, the authors randomly split all datasets into training (80%) and testing (20%) sets.
- Following previous works [13, 14], the authors use the popular top-N metrics, including MRR@N (Mean Reciprocal Rank), HR@N (Hit Ratio) and NDCG@N (Normalized Discounted Cumulative Gain), where N is set to 5 and 20.
- To evaluate the parameter e ciency, the authors Data Model.
- MRR@5 HR@5 TT Params NextItNet Weishi.
- Bo-NextItNet 0.1063 0.1766 66 34M

Conclusion

- The authors have proposed CpRec, a exible & generic neural network compression framework for learning compact sequential recommender models.
- CpRec signi cantly reduces parameter size in both the input and so max layer by leveraging the inherent long-tailed item distribution.
- CpRec performs further compression by a series of layer-wise parameter sharing methods.
- The authors expect CpRec to be valuable for existing SRS based on deep neural networks

Summary

## Introduction:

Sequential (a.k.a. session-based) recommender systems (SRS) have become a research hotspot in the recommendation eld. is is because user interaction behaviors in real-life scenarios o en exist in a form of chronological sequences.- Is is because user interaction behaviors in real-life scenarios o en exist in a form of chronological sequences
- In such scenarios, traditional RS based on collaborative ltering [25] or content features [21] fail to model user’s dynamic interests and o er only sub-optimal performance.
- As shown in Figure 1, the prediction accuracy of the sequential recommender model NextItNet [42] can be largely improved by increasing its model size, i.e., using a larger embedding dimension or a deeper network architecture.
- NextItNet obtains more than 20% accuracy gains by increasing d from 64 to 512, along with about 3 times larger parameters
## Methods:

The authors present two main model compression techniques to improve the parameter e ciency of SRS.- An obvious di culty is that if the authors set adaptive (i.e.,variable-sized) embeddings to items, they cannot be directly trained by the typical sequential recommender model due to inconsistent dimensions of middle layers.
- To this end, the authors perform dimension transformation by multiplying a projection matrix.
- The transformation process is equivalent to a low-rank factorization given that the original large embedding matrix is reconstructed by two smaller matrices
## Results:

In order to evaluate the recommendation accuracy of CpRec, the authors randomly split all datasets into training (80%) and testing (20%) sets.- Following previous works [13, 14], the authors use the popular top-N metrics, including MRR@N (Mean Reciprocal Rank), HR@N (Hit Ratio) and NDCG@N (Normalized Discounted Cumulative Gain), where N is set to 5 and 20.
- To evaluate the parameter e ciency, the authors Data Model.
- MRR@5 HR@5 TT Params NextItNet Weishi.
- Bo-NextItNet 0.1063 0.1766 66 34M
## Conclusion:

The authors have proposed CpRec, a exible & generic neural network compression framework for learning compact sequential recommender models.- CpRec signi cantly reduces parameter size in both the input and so max layer by leveraging the inherent long-tailed item distribution.
- CpRec performs further compression by a series of layer-wise parameter sharing methods.
- The authors expect CpRec to be valuable for existing SRS based on deep neural networks

- Table1: Statistic of the evaluated datasets. ”M” and ”K” is short for million and kilo, ”t” is the length of interaction sequences. For ColdRec, the le and right values devided by ‘/’ denote the source and target dataset, respectively
- Table2: Overall performance comparison, including recommendation accuracy, parameter e ciency (Params) , training time and inference speedup (evaluated by the generation of top-5 items). We omit the Params, Training Time (min) and Inference Speedup for GRU4Rec and Caser since they are not comparable to CpRec. MostPop returns item lists ranked by popularity. CpRec with cross-layer [<a class="ref-link" id="c18" href="#r18">18</a>], cross-block, adjacent-layer and adjacent-block parameter sharing is referred to CpRec-Cl, CpRec-Cb, CpRec-Al and CpRec-Ab respectively
- Table3: Performance comparison w.r.t. how to apply the blockwise embedding decomposition. NextItNet that uses block-wise decomposition in the input layer, output layer and both are referred to Bi-NextItNet, Bo-NextItNet and Bio-NextItNet, respectively. B1NextItNet employs the standard low-rank decomposition (i.e., with only 1 block) in the input and so max layer inspired by [<a class="ref-link" id="c18" href="#r18">18</a>]. Note that for clarity only the parameters in the input and output matrices are reported in the Params Column. TT is short for training time (unit: min). e inference speedup is simply omitted due to similar results as in Table 2
- Table4: e impact of layer-wise parameter sharing strategies. NextItNet with cross-layer, cross-block, adjacent-layer and adjacent-block parameter sharing is denoted by Cl-NextItNet, CbNextItNet, Al-NextItNet, Ab-NextItNet, respectively. Note for clarity only the parameters in the middle layers are shown in the Params Column
- Table5: e e ect of adaptive embedding decomposition applied to GRU4Rec. TT is short for training time (unit: min)
- Table6: CpRec vs. NextItNet on the transfer learning task. Note that our evaluation strictly follows [<a class="ref-link" id="c41" href="#r41">41</a>]. MRR@5 & HR@5 are the netuned accuracy, whereas Params and training time are evaluated on the pre-trained model, which is computationally more expensive than the netuned model

Related work

- 2.1 DNN-based SRS

Recently, deep neural networks (DNNs) have brought great improvements for SRS and almost dominate this eld. us far, three types of DNN models have been explored for SRS. Among them, Recurrent Neural Networks (RNNs) are o en a natural choice for modeling sequence data [8]. GRU4Rec[14, 29] is regarded as the seminal work that rstly applied gated recurrent units (GRU) architecture for sequential recommendation tasks. Inspired by them, a variety of RNN variants have been proposed to address the sequential recommendation problems, such as personalized SRS with hierarchical RNN [39], content- & context-based SRS [7, 26], data augmentation-based SRS [29]. While e ective, these RNN-based models seriously depend on the hidden state of the entire past, which cannot take full advantage of modern parallel processing resources [42], such as GPU/TPU. By contrast, convolutional neural networks (CNNs) and pure a ention-based models do not have such limitations since the entire sequence is already available during training. In addition, CNN and a ention-based sequential models can perform be er than RNN recommenders since much more hidden layers can be stacked by the residual block architecture [11]. To be more speci c, [42] proposed a CNN-based generative model called NextItNet, which employs a stack of dilated convolutional layers to increase the receptive eld when modeling long-range sequences. Likewise, self-a ention based models, such as SASRec [17] and BERT4Rec [28] also obtained competitive results. Compared with NextItNet, the self-a ention mechanism is computationally more expensive since calculating self-a ention of all timesteps requires quadratic complexity and memory.

Reference

- Alexei Baevski and Michael Auli. 2018. Adaptive input representations for neural language modeling. arXiv preprint arXiv:1809.10853 (2018).
- Alexandre Boulch. 2017. Sharesnet: reducing residual network parameter number by sharing weights. arXiv preprint arXiv:1702.08782 (2017).
- Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, and Lukasz Kaiser. 2018. Universal Transformers. arXiv e-prints, Article arXiv:1807.03819 (Jul 2018), arXiv:1807.03819 pages. arXiv:cs.CL/1807.03819
- Misha Denil, Babak Shakibi, Laurent Dinh, Marc’Aurelio Ranzato, and Nando De Freitas. 2013. Predicting parameters in deep learning. In Advances in neural information processing systems. 2148–2156.
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
- Yunchao Gong, Liu Liu, Ming Yang, and Lubomir Bourdev. 2014. Compressing deep convolutional networks using vector quantization. arXiv preprint arXiv:1412.6115 (2014).
- Youyang Gu, Tao Lei, Regina Barzilay, and Tommi Jaakkola. 2016. Learning to re ne text based recommendations. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2103–2108.
- Guibing Guo, Shichang Ouyang, Xiaodong He, Fajie Yuan, and Xiaohua Liu. 2019. Dynamic item block and prediction enhancing block for sequential recommendation. In Proc. Int. Joint Conf. Artif. Intell.(IJCAI).
- Song Han, Huizi Mao, and William J. Dally. 2015. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained antization and Hu man Coding. arXiv e-prints, Article arXiv:1510.00149 (Oct 2015), arXiv:1510.00149 pages. arXiv:cs.CV/1510.00149
- Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and hu man coding. arXiv preprint arXiv:1510.00149 (2015).
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pa ern recognition. 770–778.
- Ruining He and Julian McAuley. 2016. Fusing similarity models with markov chains for sparse sequential recommendation. In 2016 IEEE 16th International Conference on Data Mining (ICDM). IEEE, 191–200.
- Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, and Meng Wang. 2020. LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation. Proceedings of the 43th International ACM SIGIR conference on Research and Development in Information Retrieval (2020).
- Balazs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2015. Session-based recommendations with recurrent neural networks. arXiv preprint arXiv:1511.06939 (2015).
- Longke Hu, Aixin Sun, and Yong Liu. 2014. Your neighbors a ect your ratings: on geographical neighborhood in uence to rating prediction. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. ACM, 345–354.
- Bin Jiang. 2013. Head/tail breaks: A new classi cation scheme for data with a heavy-tailed distribution. e Professional Geographer 65, 3 (2013), 482–494.
- Wang-Cheng Kang and Julian McAuley. 2018. Self-a entive sequential recommendation. In 2018 IEEE International Conference on Data Mining (ICDM). IEEE, 197–206.
- Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 (2019).
- Hai-Son Le, Ilya Oparin, Alexandre Allauzen, Jean-Luc Gauvain, and Francois Yvon. 2011. Structured output layer neural network language model. In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 5524–5527.
- Jing Li, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Tao Lian, and Jun Ma. 2017. Neural a entive session-based recommendation. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. ACM, 1419–1428.
- Pasquale Lops, Marco De Gemmis, and Giovanni Semeraro. 2011. Content-based recommender systems: State of the art and trends. In Recommender systems handbook. Springer, 73–105.
- Juergen Lue in, Susanne Rothermel, and Mark Andrew. 2019. Future of in-vehicle recommendation systems@ Bosch. In Proceedings of the 13th ACM Conference on Recommender Systems. 524–524.
- Shilin, Fajie Yuan, Guibing Guo, Liguang Zhang, and Wei Wei. 2020. CmnRec: Sequential Recommendations with Chunk-accelerated Memory Network. arXiv preprint arXiv:2004.13401 (2020).
- Tara N Sainath, Brian Kingsbury, Vikas Sindhwani, Ebru Arisoy, and Bhuvana Ramabhadran. 2013. Low-rank matrix factorization for deep neural network training with high-dimensional output targets. In 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, 6655–6659.
- Badrul Munir Sarwar, George Karypis, Joseph A Konstan, John Riedl, et al. 2001. Item-based collaborative ltering recommendation algorithms. Www 1 (2001), 285–295.
- Elena Smirnova and Flavian Vasile. 2017. Contextual sequence modeling for recommendation with recurrent neural networks. In Proceedings of the 2nd Workshop on Deep Learning for Recommender Systems. ACM, 2–9.
- Suraj Srinivas and R Venkatesh Babu. 2015. Data-free parameter pruning for deep neural networks. arXiv preprint arXiv:1507.06149 (2015).
- Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang. 2019. BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer. arXiv preprint arXiv:1904.06690 (2019).
- Yong Kiam Tan, Xinxing Xu, and Yong Liu. 2016. Improved recurrent neural networks for session-based recommendations. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. ACM, 17–22.
- Jiaxi Tang, Francois Belle i, Sagar Jain, Minmin Chen, Alex Beutel, Can Xu, and Ed H Chi. 2019. Towards neural mixture recommender for long range dependent user sequences. In e World Wide Web Conference. ACM, 1782–1793.
- Jiaxi Tang and Ke Wang. 2018. Personalized top-n sequential recommendation via convolutional sequence embedding. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. ACM, 565–573.
- Jiaxi Tang and Ke Wang. 2018. Ranking distillation: Learning compact ranking models with high performance for recommender system. In Proceedings of the
- Wenting Tu, David W Cheung, Nikos Mamoulis, Min Yang, and Ziyu Lu. 2015. Activity-partner recommendation. In PAKDD. 591–604.
- Vincent Vanhoucke, Andrew Senior, and Mark Z Mao. 2011. Improving the speed of neural networks on CPUs. (2011).
- Jingyi Wang, Qiang Liu, Zhaocheng Liu, and Shu Wu. 2019. Towards Accurate and Interpretable Sequential Prediction: A CNN & A ention-Based Feature Extractor. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 1703–1712.
- Jun Wang, Lantao Yu, Weinan Zhang, Yu Gong, Yinghui Xu, Benyou Wang, Peng Zhang, and Dell Zhang. 2017. Irgan: A minimax game for unifying generative and discriminative information retrieval models. In Proceedings of the 40th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 515–524.
- Jiaxiang Wu, Cong Leng, Yuhang Wang, Qinghao Hu, and Jian Cheng. 2016. antized convolutional neural networks for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pa ern Recognition. 4820–4828.
- Yuhui Xu, Yongzhuang Wang, Aojun Zhou, Weiyao Lin, and Hongkai Xiong. 2018. Deep neural network compression with single and multiple level quantization. In irty-Second AAAI Conference on Arti cial Intelligence.
- Haochao Ying, Fuzhen Zhuang, Fuzheng Zhang, Yanchi Liu, Guandong Xu, Xing Xie, Hui Xiong, and Jian Wu. 2018. Sequential recommender system based on hierarchical a ention networks. In the 27th International Joint Conference on Arti cial Intelligence.
- Fajie Yuan, Guibing Guo, Joemon M Jose, Long Chen, Haitao Yu, and Weinan Zhang. 2016. Lambdafm: learning optimal ranking with factorization machines using lambda surrogates. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. 227–236.
- Fajie Yuan, Xiangnan He, Alexandros Karatzoglou, and Liguang Zhang. 2020. Parameter-E cient Transfer from Sequential Behaviors for User Modeling and Recommendation. arXiv (2020), arXiv–2001.
- Fajie Yuan, Alexandros Karatzoglou, Ioannis Arapakis, Joemon M Jose, and Xiangnan He. 2019. A Simple Convolutional Generative Network for Next Item Recommendation. In Proceedings of the Twel h ACM International Conference on Web Search and Data Mining. ACM, 582–590.
- Fajie Yuan, Xin Xin, Xiangnan He, Guibing Guo, Weinan Zhang, Chua Tat-Seng, and Joemon M Jose. 2018. fBGD: Learning embeddings from positive unlabeled data with BGD. (2018).
- Sergey Zagoruyko and Nikos Komodakis. 2016. Wide residual networks. arXiv preprint arXiv:1605.07146 (2016).

Full Text

Tags

Comments