DiffAttack: Evasion Attacks Against Diffusion-Based Adversarial Purification
NeurIPS 2023(2023)
摘要
Diffusion-based purification defenses leverage diffusion models to remove
crafted perturbations of adversarial examples and achieve state-of-the-art
robustness. Recent studies show that even advanced attacks cannot break such
defenses effectively, since the purification process induces an extremely deep
computational graph which poses the potential problem of gradient obfuscation,
high memory cost, and unbounded randomness. In this paper, we propose a unified
framework DiffAttack to perform effective and efficient attacks against
diffusion-based purification defenses, including both DDPM and score-based
approaches. In particular, we propose a deviated-reconstruction loss at
intermediate diffusion steps to induce inaccurate density gradient estimation
to tackle the problem of vanishing/exploding gradients. We also provide a
segment-wise forwarding-backwarding algorithm, which leads to memory-efficient
gradient backpropagation. We validate the attack effectiveness of DiffAttack
compared with existing adaptive attacks on CIFAR-10 and ImageNet. We show that
DiffAttack decreases the robust accuracy of models compared with SOTA attacks
by over 20
10
series of ablations studies, and we find 1) DiffAttack with the
deviated-reconstruction loss added over uniformly sampled time steps is more
effective than that added over only initial/final steps, and 2) diffusion-based
purification with a moderate diffusion length is more robust under DiffAttack.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要