A Comprehensive Study on Robustness of Image Classification Models: Benchmarking and Rethinking
INTERNATIONAL JOURNAL OF COMPUTER VISION(2024)
Abstract
The robustness of deep neural networks is frequently compromised when faced with adversarial examples, common corruptions, and distribution shifts, posing a significant research challenge in the advancement of deep learning. Although new deep learning methods and robustness improvement techniques have been constantly proposed, the robustness evaluations of existing methods are often inadequate due to their rapid development, diverse noise patterns, and simple evaluation metrics. Without thorough robustness evaluations, it is hard to understand the advances in the field and identify the effective methods. In this paper, we establish a comprehensive robustness benchmark called ARES-Bench on the image classification task. In our benchmark, we evaluate the robustness of 61 typical deep learning models on ImageNet with diverse architectures (e.g., CNNs, Transformers) and learning algorithms (e.g., normal supervised training, pre-training, adversarial training) under numerous adversarial attacks and out-of-distribution (OOD) datasets. Using robustness curves as the major evaluation criteria, we conduct large-scale experiments and draw several important findings, including: (1) there exists an intrinsic trade-off between the adversarial and natural robustness of specific noise types for the same model architecture; (2) adversarial training effectively improves adversarial robustness, especially when performed on Transformer architectures; (3) pre-training significantly enhances natural robustness by leveraging larger training datasets, incorporating multi-modal data, or employing self-supervised learning techniques. Based on ARES-Bench, we further analyze the training tricks in large-scale adversarial training on ImageNet. Through tailored training settings, we achieve a new state-of-the-art in adversarial robustness. We have made the benchmarking results and code platform publicly available.
MoreTranslated text
Key words
Robustness benchmark,Distribution shift,Pre-training,Adversarial training,Image classification
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined