Redeem Myself: Purifying Backdoors in Deep Learning Models using Self Attention Distillation

SP(2023)

引用 3|浏览29
暂无评分
摘要
Recent works have revealed the vulnerability of deep neural networks to backdoor attacks, where a backdoored model orchestrates targeted or untargeted misclassification when activated by a trigger. A line of purification methods (e.g., fine-pruning, neural attention transfer, MCR [69]) have been proposed to remove the backdoor in a model. However, they either fail to reduce the attack success rate of more advanced backdoor attacks or largely degrade the prediction capacity of the model for clean samples. In this paper, we put forward a new purification defense framework, dubbed SAGE, which utilizes self-attention distillation to purge models of backdoors. Unlike traditional attention transfer mechanisms that require a teacher model to supervise the distillation process, SAGE can realize self-purification with a small number of clean samples. To enhance the defense performance, we further propose a dynamic learning rate adjustment strategy that carefully tracks the prediction accuracy of clean samples to guide the learning rate adjustment. We compare the defense performance of SAGE with 6 state-of-the-art defense approaches against 8 backdoor attacks on 4 datasets. It is shown that SAGE can reduce the attack success rate by as much as 90% with less than 3% decrease in prediction accuracy for clean samples. We will open-source our codes upon publication.
更多
查看译文
关键词
Backdoor attacks, backdoor defenses, selfattention distillation, deep neural networks
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要