Rethinking Softmax Cross-Entropy Loss for Adversarial Robustness

arXiv: Learning, 2019.

Cited by: 26|Bibtex|Views233
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com|arxiv.org
Weibo:
We empirically demonstrate several favorable merits of our method: Lead to reliable robustness even under strong adaptive attacks in different threat models; Keep high performance on clean inputs comparable to softmax cross-entropy; Introduce little extra computation compared to ...

Abstract:

Previous work shows that adversarially robust generalization requires larger sample complexity, and the same dataset, e.g., CIFAR-10, which enables good standard accuracy may not suffice to train robust models. Since collecting new training data could be costly, we focus on better utilizing the given data by inducing the regions with high...More
Highlights
  • The deep neural networks (DNNs) trained by the softmax cross-entropy (SCE) loss have achieved state-of-the-art performance on various tasks (Goodfellow et al, 2016)
  • Inspired by the above analyses, we propose the Max-Mahalanobis center (MMC) loss to explicitly learn more structured representations and induce high-density regions in the feature space
  • We formally demonstrate that applying the softmax function in training could potentially lead to unexpected supervisory signals
  • We propose the Max-Mahalanobis center loss to learn more structured representations and induce high-density regions in the feature space
  • We empirically demonstrate several favorable merits of our method: (i) Lead to reliable robustness even under strong adaptive attacks in different threat models; Keep high performance on clean inputs comparable to softmax cross-entropy; Introduce little extra computation compared to the softmax cross-entropy loss; Compatible with the existing defense mechanisms, e.g., the adversarial training methods
  • Our analyses in this paper provide useful insights for future work on designing new objectives beyond the softmax cross-entropy framework
Summary
  • The deep neural networks (DNNs) trained by the softmax cross-entropy (SCE) loss have achieved state-of-the-art performance on various tasks (Goodfellow et al, 2016).
  • By inducing high-density feature regions, there would be locally sufficient samples to train robust classifiers and return reliable predictions (Schmidt et al, 2018).
  • We first formally analyze the sample density distribution induced by the SCE loss and its other variants (Pang et al, 2018; Wan et al, 2018) in Sec. 3.2, which demonstrates that these previously proposed objectives convey unexpected supervisory signals on the training points, which make the learned features tend to spread over the space sparsely.
  • The results demonstrate that our method can lead to reliable robustness of the trained models with little extra computation, while maintaining high clean accuracy with faster convergence rates compared to the SCE loss and its variants.
  • By inducing high-density regions in the feature space, it can be expected to have locally sufficient samples to train robust models that are able to return reliable predictions.
  • We can derive the approximated sample density in the feature space induced by the g-SCE loss, as stated in the following theorem: Theorem 1.
  • Inspired by the above analyses, we propose the Max-Mahalanobis center (MMC) loss to explicitly learn more structured representations and induce high-density regions in the feature space.
  • Instead of repeatedly searching for an internal tradeoff in training as the center loss, the monotonicity of the supervisory signals induced by MMC can better exploit model capacity and lead to faster convergence, as empirically shown in Fig. 3(a).
  • From the results in Table 1, we can see that higher sample density alone in "MMC-10" can already lead to much better robustness than other baseline methods even under the adaptive attacks, while using the optimal center set μ∗ as in "MMC-10" can further improve performance.
  • As suggested in Carlini et al (2019), providing evidence of being Table 3: Accuracy (%) of MMC-10 under robust against the black-box attacks is critical to claim reliable ro- SPSA with different batch sizes.
  • We propose the MMC loss to learn more structured representations and induce high-density regions in the feature space.
  • We empirically demonstrate several favorable merits of our method: (i) Lead to reliable robustness even under strong adaptive attacks in different threat models; Keep high performance on clean inputs comparable to SCE; Introduce little extra computation compared to the SCE loss; Compatible with the existing defense mechanisms, e.g., the AT methods.
  • Our analyses in this paper provide useful insights for future work on designing new objectives beyond the SCE framework
Tables
  • Table1: Classification accuracy (%) on the white-box adversarial examples crafted on the test set of CIFAR-10. The superscript tar indicates targeted attacks, while un indicates untargeted attacks. The subscripts indicate the number of iteration steps when performing attacks. The results w.r.t the MMC loss are reported under the adaptive versions of different attacks. The notation ≤ 1 represents accuracy less than 1%. The MMC-10 (rand) is an ablation study where the class centers are uniformly sampled on the hypersphere
  • Table2: Experiments on CIFAR-10. Part I: Averaged l2 distortion of the white-box adversarial examples crafted by C&W with 1,000 iteration steps. Part II: Classification accuracy (%) under the block-box SPSA attack. Part III: Classification accuracy (%) under general transformations. The standard deviation σ for the Gaussian noise is 0.05, the degree range is ±30◦ for random rotation
  • Table3: Accuracy (%) of MMC-10 under robust against the black-box attacks is critical to claim reliable ro- SPSA with different batch sizes
  • Table4: Experiments on CIFAR-100. Part I: Classification accuracy (%) on the clean test samples. Part II: Classification accuracy (%) under the white-box PGD attacks and the block-box SPSA attack. The attacks are adaptive for MMC. Here the batch size for SPSA is 128. Part III: Averaged l2 distortion of the white-box adversarial examples crafted by C&W with 1,000 iteration steps and 9 binary search epochs
  • Table5: Classification accuracy (%) on the white-box adversarial examples crafted on the test set of CIFAR-10 and CIFAR-100. The results w.r.t the MMC loss are reported under the adaptive versions of different attacks. MMC can better exploit deep architectures, while SCE cannot
Funding
  • This work was supported by the National Key Research and Development Program of China (No 2017YFA0700904), NSFC Projects (Nos. 61620106010, U19B2034, U1811461), Beijing NSF Project (No L172037), Beijing Academy of Artificial Intelligence (BAAI), Tsinghua-Huawei Joint Research Program, a grant from Tsinghua Institute for Guo Qiang, Tiangong Institute for Intelligent Computing, the JP Morgan Faculty Research Program and the NVIDIA NVAIL Program with GPU/DGX Acceleration
Study subjects and analysis
centers: 2
These MMD centers are invariable during training, which are crafted according to the criterion: μ∗ = arg minμ maxi=j μi, μj. Intuitively, this criterion is to maximize the minimal angle between any two centers, which can provide optimal inter-class dispersion as shown in Pang et al (2018). In Appendix B.1, we provide the generation algorithm for μ∗ in MMC

test samples: 10000
The two panels separately correspond to two randomly selected clean inputs indicated by black stars. The ten colored clusters in each panel consist of the features of all the 10,000 test samples in MNIST, where each color corresponds to one class. We can see that the adaptive attacks are indeed much more efficient than the non-adaptive one

Full Text
Your rating :
0

 

Tags
Comments