Nautilus: An Optimized System for Deep Transfer Learning over Evolving Training Datasets

PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22)（2022）

引用 6|浏览20

暂无评分

摘要

Deep learning (DL) has revolutionized unstructured data analytics. But in most cases, DL needs massive labeled datasets and large compute clusters, which hinders its adoption. These limitations can be overcome using a popular paradigm called deep transfer learning (DTL). With DTL, one adapts a pre-trained DL model instead of training a model from scratch. Thus, DTL reduces the massive training data and compute requirements to train a model. During adaptation, a common practice is to freeze most pre-trained model parts and adapt only the remaining. Since no single adaptation scheme is universally the best, one often evaluates several schemes, which is also called model selection. We also observed that data labeling for DTL is seldom a one-off process. One often updates their labeled data intermittently by adding new labeled data and performs model selection to evaluate the accuracy of the trained models. Today, one executes this workload by performing computations for the entire pre-trained model and repeats it for every model selection cycle. This approach results in redundant computations in frozen model parts and causes usability and system inefficiency issues. In this work, we reimagine DTL model selection in the presence of frozen layers as an instance of multi-query optimization and propose two optimizations that reduce redundant computations and training overheads. We implement our optimizations into a data system called NAUTILUS. Experiments with end-to-end workloads on benchmark datasets show that NAUTILUS reduces DTL model selection runtimes by up to 5X compared to the current practice.

查看译文

关键词

Systems for Machine Learning, Deep Transfer Learning, Multi-Query Optimization

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要