Clustering based Behavior Sampling with Long Sequential Data for CTR Prediction

Yuren Zhang,Enhong Chen,Binbin Jin,Hao Wang,Min Hou,Wei Huang,Runlong Yu

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval（2022）

引用 9|浏览128

暂无评分

摘要

Click-through rate (CTR) prediction is fundamental in many industrial applications, such as online advertising and recommender systems. With the development of the online platforms, the sequential user behaviors grow rapidly, bringing us great opportunity to better understand user preferences.However, it is extremely challenging for existing sequential models to effectively utilize the entire behavior history of each user. First, there is a lot of noise in such long histories, which can seriously hurt the prediction performance. Second, feeding the long behavior sequence directly results in infeasible inference time and storage cost. In order to tackle these challenges, in this paper we propose a novel framework, which we name as User Behavior Clustering Sampling (UBCS). In UBCS, short sub-sequences will be obtained from the whole user history sequence with two cascaded modules: (i) Behavior Sampling module samples short sequences related to candidate items using a novel sampling method which takes relevance and temporal information into consideration; (ii) Item Clustering module clusters items into a small number of cluster centroids, mitigating the impact of noise and improving efficiency. Then, the sampled short sub-sequences will be fed into the CTR prediction module for efficient prediction. Moreover, we conduct a self-supervised consistency pre-training task to extract user persona preference and optimize the sampling module effectively. Experiments on real-world datasets demonstrate the superiority and efficiency of our proposed framework.

查看译文

关键词

CTR Prediction, Information Retrieval, Long Sequential User Behavior Modeling

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要