Adversarial Robustness Through Artifact Design
CoRR(2024)
摘要
Adversarial examples arose as a challenge for machine learning. To hinder
them, most defenses alter how models are trained (e.g., adversarial training)
or inference is made (e.g., randomized smoothing). Still, while these
approaches markedly improve models' adversarial robustness, models remain
highly susceptible to adversarial examples. Identifying that, in certain
domains such as traffic-sign recognition, objects are implemented per standards
specifying how artifacts (e.g., signs) should be designed, we propose a novel
approach for improving adversarial robustness. Specifically, we offer a method
to redefine standards, making minor changes to existing ones, to defend against
adversarial examples. We formulate the problem of artifact design as a robust
optimization problem, and propose gradient-based and greedy search methods to
solve it. We evaluated our approach in the domain of traffic-sign recognition,
allowing it to alter traffic-sign pictograms (i.e., symbols within the signs)
and their colors. We found that, combined with adversarial training, our
approach led to up to 25.18% higher robust accuracy compared to
state-of-the-art methods against two adversary types, while further increasing
accuracy on benign inputs.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要