On the Effectiveness of Distillation in Mitigating Backdoors in Pre-trained Encoder
arxiv(2024)
摘要
In this paper, we study a defense against poisoned encoders in SSL called
distillation, which is a defense used in supervised learning originally.
Distillation aims to distill knowledge from a given model (a.k.a the teacher
net) and transfer it to another (a.k.a the student net). Now, we use it to
distill benign knowledge from poisoned pre-trained encoders and transfer it to
a new encoder, resulting in a clean pre-trained encoder. In particular, we
conduct an empirical study on the effectiveness and performance of distillation
against poisoned encoders. Using two state-of-the-art backdoor attacks against
pre-trained image encoders and four commonly used image classification
datasets, our experimental results show that distillation can reduce attack
success rate from 80.87
Moreover, we investigate the impact of three core components of distillation on
performance: teacher net, student net, and distillation loss. By comparing 4
different teacher nets, 3 student nets, and 6 distillation losses, we find that
fine-tuned teacher nets, warm-up-training-based student nets, and
attention-based distillation loss perform best, respectively.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要