Real-Fake: Effective Training Data Synthesis Through Distribution Matching
arXiv (Cornell University)(2023)
摘要
Synthetic training data has gained prominence in numerous learning tasks and
scenarios, offering advantages such as dataset augmentation, generalization
evaluation, and privacy preservation. Despite these benefits, the efficiency of
synthetic data generated by current methodologies remains inferior when
training advanced deep models exclusively, limiting its practical utility. To
address this challenge, we analyze the principles underlying training data
synthesis for supervised learning and elucidate a principled theoretical
framework from the distribution-matching perspective that explicates the
mechanisms governing synthesis efficacy. Through extensive experiments, we
demonstrate the effectiveness of our synthetic data across diverse image
classification tasks, both as a replacement for and augmentation to real
datasets, while also benefits challenging tasks such as out-of-distribution
generalization and privacy preservation.
更多查看译文
关键词
Training Data Synthesis,Distribution Matching,Data Efficiency
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要