Robust Diffusion Models for Adversarial Purification
arxiv(2024)
摘要
Diffusion models (DMs) based adversarial purification (AP) has shown to be
the most powerful alternative to adversarial training (AT). However, these
methods neglect the fact that pre-trained diffusion models themselves are not
robust to adversarial attacks as well. Additionally, the diffusion process can
easily destroy semantic information and generate a high quality image but
totally different from the original input image after the reverse process,
leading to degraded standard accuracy. To overcome these issues, a natural idea
is to harness adversarial training strategy to retrain or fine-tune the
pre-trained diffusion model, which is computationally prohibitive. We propose a
novel robust reverse process with adversarial guidance, which is independent of
given pre-trained DMs and avoids retraining or fine-tuning the DMs. This robust
guidance can not only ensure to generate purified examples retaining more
semantic content but also mitigate the accuracy-robustness trade-off of DMs for
the first time, which also provides DM-based AP an efficient adaptive ability
to new attacks. Extensive experiments are conducted to demonstrate that our
method achieves the state-of-the-art results and exhibits generalization
against different attacks.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要