Mixup Inference: Better Exploiting Mixup to Defend Adversarial Attacks

ICLR, 2020.

Cited by: 17|Bibtex|Views241
EI
Other Links: academic.microsoft.com|dblp.uni-trier.de|arxiv.org
Weibo:
We propose the mixup inference method, which is specialized for the trained models with globally linear behaviors induced by, e.g., mixup or interpolated adversarial training

Abstract:

It has been widely recognized that adversarial examples can be easily crafted to fool deep networks, which mainly root from the locally non-linear behavior nearby input examples. Applying mixup in training provides an effective mechanism to improve generalization performance and model robustness against adversarial perturbations, which in...More
Highlights
  • Deep neural networks (DNNs) have achieved state-of-the-art performance on various tasks (Goodfellow et al, 2016)
  • We develop an inference principle for mixup-trained models, named mixup inference (MI)
  • When the input is adversarial (i.e., z = 1), mixup inference-PL can be applied as a general-purpose defense or a detection-purpose defense, as we separately introduce below: General-purpose defense: If mixup inference-PL can improve the general-purpose robustness, it should satisfy robustness improving condition in Eq (10)
  • In Fig. 2, we empirically demonstrate that most of the existing adversarial attacks, e.g., the PGD attack (Madry et al, 2018) satisfies these properties
  • We evaluate a variant of mixup inference, called mixup inference-Combined, which applies mixup inference-OL if the input is detected as adversarial by mixup inference-PL with a default detection threshold; otherwise returns the prediction on the original input
  • We propose the mixup inference method, which is specialized for the trained models with globally linear behaviors induced by, e.g., mixup or interpolated adversarial training
Summary
  • Deep neural networks (DNNs) have achieved state-of-the-art performance on various tasks (Goodfellow et al, 2016).
  • The mixup training method (Zhang et al, 2018) introduces globally linear behavior in-between the data manifolds, which can improve adversarial robustness (Zhang et al, 2018; Verma et al, 2019a).
  • Most of the previous work only focuses on embedding the mixup mechanism in the training phase, while the induced global linearity of the model predictions is not well exploited in the inference phase.
  • Compared to passive defense by directly classifying the inputs (Zhang et al, 2018; Lamb et al, 2019), it would be more effective to actively defend adversarial attacks by breaking their locality via the globally linear behavior of the mixup-trained models.
  • Training by mixup will induce globally linear behavior of models in-between data manifolds, which can empirically improve generalization performance and adversarial robustness (Zhang et al, 2018; Tokozume et al, 2018a;b; Verma et al, 2019a;b).
  • Compared to passively defending adversarial examples by directly classifying them, it would be more effective to actively utilize the globality of mixup-trained models in the inference phase to break the locality of adversarial perturbations.
  • In the general-purpose setting where we aim to correctly classify adversarial examples (Madry et al, 2018), we claim that the MI method improves the robustness if the prediction
  • In practice we find that MI-PL performs better than MI-OL in detection, since empirically mixup-trained models cannot induce ideal global linearity (cf.
  • The attack method for AT and interpolated AT is untargeted PGD-10 with = 8/255 and step size 2/255 (Madry et al, 2018), and the ratio of the clean examples and the adversarial ones in each mini-batch is 1 : 1 (Lamb et al, 2019).
  • We compare MI with previous general-purpose defenses applied in the inference phase, e.g., adding Gaussian noise or random rotation (Tabacof & Valle, 2016); performing random padding or resizing after random cropping (Guo et al, 2018; Xie et al, 2018).
  • As shown in these results, our MI method can significantly improve the robustness for the trained models with induced global linearity, and is compatible with training-phase defenses like the interpolated AT method.
  • The results verify that our MI methods exploit the global linearity of the mixup-trained models, rather than introduce randomness.
  • We propose the MI method, which is specialized for the trained models with globally linear behaviors induced by, e.g., mixup or interpolated AT.
  • We empirically verify that applying MI can return more reliable predictions under different threat models
Tables
  • Table1: The the simplified formulas of Eq (7) and Eq (8) in different versions of MI. Here MI-PL indicates mixup inference with predicted label; MI-OL indicates mixup inference with other labels
  • Table2: Classification accuracy (%) on the oblivious adversarial examples crafted on 1,000 randomly sampled test points of CIFAR-10. Perturbation = 8/255 with step size 2/255. The subscripts indicate the number of iteration steps when performing attacks. The notation ≤ 1 represents accuracy less than 1%. The parameter settings for each method can be found in Table 4
  • Table3: Classification accuracy (%) on the oblivious adversarial examples crafted on 1,000 randomly sampled test points of CIFAR-100. Perturbation = 8/255 with step size 2/255. The subscripts indicate the number of iteration steps when performing attacks. The notation ≤ 1 represents accuracy less than 1%. The parameter settings for each method can be found in Table 5
  • Table4: The parameter settings for the methods in Table 2. The number of execution for each random method is 30
  • Table5: The parameter settings for the methods in Table 3. The number of execution for each random method is 30
Funding
  • This work was supported by the National Key Research and Development Program of China (No 2017YFA0700904), NSFC Projects (Nos. 61620106010, U19B2034, U1811461), Beijing NSF Project (No L172037), Beijing Academy of Artificial Intelligence (BAAI), Tsinghua-Huawei Joint Research Program, a grant from Tsinghua Institute for Guo Qiang, Tiangong Institute for Intelligent Computing, the JP Morgan Faculty Research Program and the NVIDIA NVAIL Program with GPU/DGX Acceleration
Study subjects and analysis
easy-to-implement cases: 2
Different distributions of sampling ys result in different versions of MI. Here we consider two easy-to-implement cases: MI with predicted label (MI-PL): In this case, the sampling label ys is the same as the predicted label y, i.e., ps(y) = 1y=yis a Dirac distribution on y. MI with other labels (MI-OL): In this case, the label ys is uniformly sampled from the labels other than y, i.e., ps(y) = Uy(y) is a discrete uniform distribution on the set {y ∈ [L]|y = y}

randomly test clean samples: 100
Intuitive mechanisms in the input space of different input-processing based defenses. x is the crafted adversarial example, x0 is the original clean example, which is virtual and unknown for the classifiers. δ is the adversarial perturbation. The results are averaged on 100 randomly test clean samples of CIFAR-10. The adversarial attack is untargeted PGD-10. Note that the ∆Gy calculated here is the minus value of it in Eq (12) and Eq (15). Results on CIFAR-10. (a) AUC scores on 1,000 randomly selected test clean samples and 1,000 adversarial counterparts crafted on these clean samples. (b) The adversarial accuracy w.r.t clean accuracy on 1,000 randomly selected test samples. The adversarial attack is untargeted PGD-10, with = 8/255 and step size 2/255. Each point for a certain method corresponds to a set of hyperparameters

randomly selected test samples: 1000
The results are averaged on 100 randomly test clean samples of CIFAR-10. The adversarial attack is untargeted PGD-10. Note that the ∆Gy calculated here is the minus value of it in Eq (12) and Eq (15). Results on CIFAR-10. (a) AUC scores on 1,000 randomly selected test clean samples and 1,000 adversarial counterparts crafted on these clean samples. (b) The adversarial accuracy w.r.t clean accuracy on 1,000 randomly selected test samples. The adversarial attack is untargeted PGD-10, with = 8/255 and step size 2/255. Each point for a certain method corresponds to a set of hyperparameters. Classification accuracy under the adaptive PGD attacks on CIFAR-10. The number of adaptive samples refers to the execution times of sampling xs in each iteration step of adaptive PGD. The dash lines are the accuracy of trained models without MI-OL under PGD attacks

Full Text
Your rating :
0

 

Tags
Comments