Dataset Distillation via the Wasserstein Metric
arxiv(2023)
Abstract
Dataset Distillation (DD) emerges as a powerful strategy to encapsulate the
expansive information of large datasets into significantly smaller, synthetic
equivalents, thereby preserving model performance with reduced computational
overhead. Pursuing this objective, we introduce the Wasserstein distance, a
metric grounded in optimal transport theory, to enhance distribution matching
in DD. Our approach employs the Wasserstein barycenter to provide a
geometrically meaningful method for quantifying distribution differences and
capturing the centroid of distribution sets efficiently. By embedding synthetic
data in the feature spaces of pretrained classification models, we facilitate
effective distribution matching that leverages prior knowledge inherent in
these models. Our method not only maintains the computational advantages of
distribution matching-based techniques but also achieves new state-of-the-art
performance across a range of high-resolution datasets. Extensive testing
demonstrates the effectiveness and adaptability of our method, underscoring the
untapped potential of Wasserstein metrics in dataset distillation.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined