Robust Data Pruning under Label Noise via Maximizing Re-labeling\n Accuracy

NeurIPS(2023)

引用 2|浏览30
暂无评分
摘要
Data pruning, which aims to downsize a large training set into a small\ninformative subset, is crucial for reducing the enormous computational costs of\nmodern deep learning. Though large-scale data collections invariably contain\nannotation noise and numerous robust learning methods have been developed, data\npruning for the noise-robust learning scenario has received little attention.\nWith state-of-the-art Re-labeling methods that self-correct erroneous labels\nwhile training, it is challenging to identify which subset induces the most\naccurate re-labeling of erroneous labels in the entire training set. In this\npaper, we formalize the problem of data pruning with re-labeling. We first show\nthat the likelihood of a training example being correctly re-labeled is\nproportional to the prediction confidence of its neighborhood in the subset.\nTherefore, we propose a novel data pruning algorithm, Prune4Rel, that finds a\nsubset maximizing the total neighborhood confidence of all training examples,\nthereby maximizing the re-labeling accuracy and generalization performance.\nExtensive experiments on four real and one synthetic noisy datasets show that\n\\algname{} outperforms the baselines with Re-labeling models by up to 9.1% as\nwell as those with a standard model by up to 21.6%.
更多
查看译文
关键词
robust data pruning,label noise,accuracy,re-labeling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要