Bi-LORA: A Vision-Language Approach for Synthetic Image Detection
arxiv(2024)
摘要
Advancements in deep image synthesis techniques, such as generative
adversarial networks (GANs) and diffusion models (DMs), have ushered in an era
of generating highly realistic images. While this technological progress has
captured significant interest, it has also raised concerns about the potential
difficulty in distinguishing real images from their synthetic counterparts.
This paper takes inspiration from the potent convergence capabilities between
vision and language, coupled with the zero-shot nature of vision-language
models (VLMs). We introduce an innovative method called Bi-LORA that leverages
VLMs, combined with low-rank adaptation (LORA) tuning techniques, to enhance
the precision of synthetic image detection for unseen model-generated images.
The pivotal conceptual shift in our methodology revolves around reframing
binary classification as an image captioning task, leveraging the distinctive
capabilities of cutting-edge VLM, notably bootstrapping language image
pre-training (BLIP2). Rigorous and comprehensive experiments are conducted to
validate the effectiveness of our proposed approach, particularly in detecting
unseen diffusion-generated images from unknown diffusion-based generative
models during training, showcasing robustness to noise, and demonstrating
generalization capabilities to GANs. The obtained results showcase an
impressive average accuracy of 93.41
generation models. The code and models associated with this research can be
publicly accessed at https://github.com/Mamadou-Keita/VLM-DETECT.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要