M atchmaker : d ata d rift m itigation in m achine l earning for l arge -s cale s ystems

Ankur Mallick,Kevin Hsieh,Behnaz Arzani,Gauri Joshi

Conference on Machine Learning and Systems (MLSys)（2022）

引用 0|浏览10

暂无评分

摘要

Today’s data centers rely more heavily on machine learning (ML) in their deployed systems. However, these systems are vulnerable to the data drift problem, that is, a mismatch between training and test data, which can lead to signiﬁcant performance degradation and system inefﬁciencies. In this paper, we demonstrate the impact of data drift in production by studying two real-world deployments in a leading cloud provider. Our study shows that, despite frequent model retraining, these deployed models experience major accuracy drops (up to 40%) and high accuracy variation, which lead to drastic increase in operational costs. None of the current solutions to the data drift problem are designed for large-scale deployments, which need to address real-world issues such as scale, ground truth latency, and mixed types of data drift. We propose Matchmaker , the ﬁrst scalable, adaptive, and ﬂexible solution to the data drift problem in large-scale production systems. Matchmaker ﬁnds the most similar training data batch and uses the corresponding ML model for inference on each test point. As part of Matchmaker we introduce a novel similarity metric to address multiple types of data drifts while only incurring limited overhead. Experiments on our two real-world ML deployments show Matchmaker signiﬁcantly improve model accuracy (upto 14% and 2%), which saves 18% and 1% in the operational costs. At the same time, Matchmaker provides 8 × and 4 × faster predictions than a state-of-the-art ML data drift solution, AUE.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要