Unifying Gradients to Improve Real-World Robustness for Deep Networks

Yingwen Wu,Sizhe Chen,Kun Fang,Xiaolin Huang

ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY（2023）

引用 0|浏览8

暂无评分

摘要

The wide application of deep neural networks (DNNs) demands an increasing amount of attention to their real-world robustness, i.e., whether a DNN resists black-box adversarial attacks, among which score-based query attacks (SQAs) are the most threatening since they can effectively hurt a victim network with only access to model outputs. Defending against SQAs requires a slight but artful variation of outputs due to the service purpose for users, who share the same output information with SQAs. In this article, we propose a real-world defense by Unifying Gradients (UniG) of different data so that SQAs could only probe a much weaker attack direction that is similar for different samples. Since such universal attack perturbations have been validated as less aggressive than the input-specific perturbations, UniG protects real-world DNNs by indicating to attackers a twisted and less informative attack direction. We implement UniG efficiently by a Hadamard product module, which is plug-and-play. According to extensive experiments on 5 SQAs, 2 adaptive attacks and 7 defense baselines, UniG significantly improves real-world robustness without hurting clean accuracy on CIFAR10 and ImageNet. For instance, UniG maintains a model of 77.80% accuracy under a 2500-query Square attack while the state-of-the-art adversarially trained model only has 67.34% on CIFAR10. Simultaneously, UniG outperforms all compared baselines in terms of clean accuracy and achieves the smallest modification of the model output. The code is released at https://github.com/snowien/UniG-pytorch.

查看译文

关键词

Black-box adversarial attack,practical adversarial defense

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要