Exploring Non-target Knowledge for Improving Ensemble Universal Adversarial Attacks

AAAI 2023(2023)

Cited 3|Views23
No score
Abstract
The ensemble attack with average weights can be leveraged for increasing the transferability of universal adversarial perturbation (UAP) by training with multiple Convolutional Neural Networks (CNNs). However, after analyzing the Pearson Correlation Coefficients (PCCs) between the ensemble logits and individual logits of the crafted UAP trained by the ensemble attack, we find that one CNN plays a dominant role during the optimization. Consequently, this average weighted strategy will weaken the contributions of other CNNs and thus limit the transferability for other black-box CNNs. To deal with this bias issue, the primary attempt is to leverage the Kullback–Leibler (KL) divergence loss to encourage the joint contribution from different CNNs, which is still insufficient. After decoupling the KL loss into a target-class part and a non-target-class part, the main issue lies in that the non-target knowledge will be significantly suppressed due to the increasing logit of the target class. In this study, we simply adopt a KL loss that only considers the non-target classes for addressing the dominant bias issue. Besides, to further boost the transferability, we incorporate the min-max learning framework to self-adjust the ensemble weights for each CNN. Experiments results validate that considering the non-target KL loss can achieve superior transferability than the original KL loss by a large margin, and the min-max training can provide a mutual benefit in adversarial ensemble attacks. The source code is available at: https://github.com/WJJLL/ND-MM.
More
Translated text
Key words
CV: Adversarial Attacks & Robustness
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined