CPR: Retrieval Augmented Generation for Copyright Protection
CVPR 2024(2024)
Abstract
Retrieval Augmented Generation (RAG) is emerging as a flexible and robusttechnique to adapt models to private users data without training, to handlecredit attribution, and to allow efficient machine unlearning at scale.However, RAG techniques for image generation may lead to parts of the retrievedsamples being copied in the model's output. To reduce risks of leaking privateinformation contained in the retrieved set, we introduce Copy-Protectedgeneration with Retrieval (CPR), a new method for RAG with strong copyrightprotection guarantees in a mixed-private setting for diffusion models.CPRallows to condition the output of diffusion models on a set of retrievedimages, while also guaranteeing that unique identifiable information aboutthose example is not exposed in the generated outputs. In particular, it doesso by sampling from a mixture of public (safe) distribution and private (user)distribution by merging their diffusion scores at inference. We prove that CPRsatisfies Near Access Freeness (NAF) which bounds the amount of information anattacker may be able to extract from the generated images. We provide twoalgorithms for copyright protection, CPR-KL and CPR-Choose. Unlike previouslyproposed rejection-sampling-based NAF methods, our methods enable efficientcopyright-protected sampling with a single run of backward diffusion. We showthat our method can be applied to any pre-trained conditional diffusion model,such as Stable Diffusion or unCLIP. In particular, we empirically show thatapplying CPR on top of unCLIP improves quality and text-to-image alignment ofthe generated results (81.4 to 83.17 on TIFA benchmark), while enabling creditattribution, copy-right protection, and deterministic, constant time,unlearning.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined