Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples

Sven Gowal,Chongli Qin,Jonathan Uesato,Timothy Mann, Pushmeet Kohli

arxiv（2020）

引用 161|浏览143

暂无评分

摘要

Adversarial training and its variants have become de facto standards for learning robust deep neural networks. In this paper, we explore the landscape around adversarial training in a bid to uncover its limits. We systematically study the effect of different training losses, model sizes, activation functions, the addition of unlabeled data (through pseudo-labeling) and other factors on adversarial robustness. We discover that it is possible to train robust models that go well beyond state-of-the-art results by combining larger models, Swish/SiLU activations and model weight averaging. We demonstrate large improvements on CIFAR-10 and CIFAR-100 against $\ell_\infty$ and $\ell_2$ norm-bounded perturbations of size $8/255$ and $128/255$, respectively. In the setting with additional unlabeled data, we obtain an accuracy under attack of 65.87% against $\ell_\infty$ perturbations of size $8/255$ on CIFAR-10 (+6.34% with respect to prior art). Without additional data, we obtain an accuracy under attack of 56.43% (+2.69%). To test the generality of our findings and without any additional modifications, we obtain an accuracy under attack of 80.45% (+7.58%) against $\ell_2$ perturbations of size $128/255$ on CIFAR-10, and of 37.70% (+9.28%) against $\ell_\infty$ perturbations of size $8/255$ on CIFAR-100.

查看译文

关键词

adversarial training,examples,norm-bounded

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要