Harnessing the Power of Large Vision Language Models for Synthetic Image Detection
arxiv(2024)
摘要
In recent years, the emergence of models capable of generating images from
text has attracted considerable interest, offering the possibility of creating
realistic images from text descriptions. Yet these advances have also raised
concerns about the potential misuse of these images, including the creation of
misleading content such as fake news and propaganda. This study investigates
the effectiveness of using advanced vision-language models (VLMs) for synthetic
image identification. Specifically, the focus is on tuning state-of-the-art
image captioning models for synthetic image detection. By harnessing the robust
understanding capabilities of large VLMs, the aim is to distinguish authentic
images from synthetic images produced by diffusion-based models. This study
contributes to the advancement of synthetic image detection by exploiting the
capabilities of visual language models such as BLIP-2 and ViTGPT2. By tailoring
image captioning models, we address the challenges associated with the
potential misuse of synthetic images in real-world applications. Results
described in this paper highlight the promising role of VLMs in the field of
synthetic image detection, outperforming conventional image-based detection
techniques. Code and models can be found at
https://github.com/Mamadou-Keita/VLM-DETECT.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要