DCAF-BERT: A Distilled Cachable Adaptable Factorized Model for Improved Ads CTR Prediction
Companion Proceedings of the Web Conference 2022(2022)
Abstract
In this paper we present a Click-through-rate (CTR) prediction model for product advertisement at Amazon. CTR prediction is challenging because the model needs to a) learn from text and numeric features, b) maintain low-latency at inference time, and c) adapt to a temporal advertisement distribution shift. Our proposed model is DCAF-BERT, a novel lightweight cache-friendly factorized model that consists of twin-structured BERT-like encoders for text with a mechanism for late fusion for tabular and numeric features. The factorization of the model allows for compartmentalised retraining which enables the model to easily adapt to distribution shifts. The twin encoders are carefully trained to leverage historical CTR data, using a large pre-trained language model and cross-architecture knowledge distillation (KD). We empirically find the right combination of pretraining, distillation and fine-tuning strategies for teacher and student which leads to a 1.7% ROC-AUC lift over the previous best model offline. In an online experiment we show that our compartmentalised refresh strategy boosts the CTR of DCAF-BERT by 3.6% on average over the baseline model consistently across a month.
MoreTranslated text
Key words
click-through-rate,language models,distillation,sponsored search
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined