SpecFormer: Guarding Vision Transformer Robustness via Maximum Singular Value Penalization
CoRR(2024)
摘要
Vision Transformers (ViTs) have gained prominence as a preferred choice for a
wide range of computer vision tasks due to their exceptional performance.
However, their widespread adoption has raised concerns about security in the
face of malicious attacks. Most existing methods rely on empirical adjustments
during the training process, lacking a clear theoretical foundation. In this
study, we address this gap by introducing SpecFormer, specifically designed to
enhance ViTs' resilience against adversarial attacks, with support from
carefully derived theoretical guarantees. We establish local Lipschitz bounds
for the self-attention layer and introduce a novel approach, Maximum Singular
Value Penalization (MSVP), to attain precise control over these bounds. We
seamlessly integrate MSVP into ViTs' attention layers, using the power
iteration method for enhanced computational efficiency. The modified model,
SpecFormer, effectively reduces the spectral norms of attention weight
matrices, thereby enhancing network local Lipschitzness. This, in turn, leads
to improved training efficiency and robustness. Extensive experiments on CIFAR
and ImageNet datasets confirm SpecFormer's superior performance in defending
against adversarial attacks.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要