CAFE: Towards Compact, Adaptive, and Fast Embedding for Large-scale Recommendation Models
Proceedings of the ACM on management of data(2024)
Abstract
Recently, the growing memory demands of embedding tables in Deep LearningRecommendation Models (DLRMs) pose great challenges for model training anddeployment. Existing embedding compression solutions cannot simultaneously meetthree key design requirements: memory efficiency, low latency, and adaptabilityto dynamic data distribution. This paper presents CAFE, a Compact, Adaptive,and Fast Embedding compression framework that addresses the above requirements.The design philosophy of CAFE is to dynamically allocate more memory resourcesto important features (called hot features), and allocate less memory tounimportant ones. In CAFE, we propose a fast and lightweight sketch datastructure, named HotSketch, to capture feature importance and report hotfeatures in real time. For each reported hot feature, we assign it a uniqueembedding. For the non-hot features, we allow multiple features to share oneembedding by using hash embedding technique. Guided by our design philosophy,we further propose a multi-level hash embedding framework to optimize theembedding tables of non-hot features. We theoretically analyze the accuracy ofHotSketch, and analyze the model convergence against deviation. Extensiveexperiments show that CAFE significantly outperforms existing embeddingcompression methods, yielding 3.92Kaggle dataset and CriteoTB dataset at a compression ratio of 10000x. Thesource codes of CAFE are available at GitHub.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined