Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models
CoRR(2024)
Abstract
It is commonplace to produce application-specific models by fine-tuning large
pre-trained models using a small bespoke dataset. The widespread availability
of foundation model checkpoints on the web poses considerable risks, including
the vulnerability to backdoor attacks. In this paper, we unveil a new
vulnerability: the privacy backdoor attack. This black-box privacy attack aims
to amplify the privacy leakage that arises when fine-tuning a model: when a
victim fine-tunes a backdoored model, their training data will be leaked at a
significantly higher rate than if they had fine-tuned a typical model. We
conduct extensive experiments on various datasets and models, including both
vision-language models (CLIP) and large language models, demonstrating the
broad applicability and effectiveness of such an attack. Additionally, we carry
out multiple ablation studies with different fine-tuning methods and inference
strategies to thoroughly analyze this new threat. Our findings highlight a
critical privacy concern within the machine learning community and call for a
reevaluation of safety protocols in the use of open-source pre-trained models.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined