Distributional Dataset Distillation with Subtask Decomposition
arxiv(2024)
摘要
What does a neural network learn when training from a task-specific dataset?
Synthesizing this knowledge is the central idea behind Dataset Distillation,
which recent work has shown can be used to compress large datasets into a small
set of input-label pairs (prototypes) that capture essential aspects
of the original dataset. In this paper, we make the key observation that
existing methods distilling into explicit prototypes are very often suboptimal,
incurring in unexpected storage cost from distilled labels. In response, we
propose Distributional Dataset Distillation (D3), which encodes the
data using minimal sufficient per-class statistics and paired with a decoder,
we distill dataset into a compact distributional representation that is more
memory-efficient compared to prototype-based methods. To scale up the process
of learning these representations, we propose Federated
distillation, which decomposes the dataset into subsets, distills them in
parallel using sub-task experts and then re-aggregates them. We thoroughly
evaluate our algorithm on a three-dimensional metric and show that our method
achieves state-of-the-art results on TinyImageNet and ImageNet-1K.
Specifically, we outperform the prior art by 6.9% on ImageNet-1K under the
storage budget of 2 images per class.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要