Backdoor Attack with Mode Mixture Latent Modification
CoRR(2024)
摘要
Backdoor attacks become a significant security concern for deep neural
networks in recent years. An image classification model can be compromised if
malicious backdoors are injected into it. This corruption will cause the model
to function normally on clean images but predict a specific target label when
triggers are present. Previous research can be categorized into two genres:
poisoning a portion of the dataset with triggered images for users to train the
model from scratch, or training a backdoored model alongside a triggered image
generator. Both approaches require significant amount of attackable parameters
for optimization to establish a connection between the trigger and the target
label, which may raise suspicions as more people become aware of the existence
of backdoor attacks. In this paper, we propose a backdoor attack paradigm that
only requires minimal alterations (specifically, the output layer) to a clean
model in order to inject the backdoor under the guise of fine-tuning. To
achieve this, we leverage mode mixture samples, which are located between
different modes in latent space, and introduce a novel method for conducting
backdoor attacks. We evaluate the effectiveness of our method on four popular
benchmark datasets: MNIST, CIFAR-10, GTSRB, and TinyImageNet.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要