GuardT2I: Defending Text-to-Image Models from Adversarial Prompts
CoRR(2024)
摘要
Recent advancements in Text-to-Image (T2I) models have raised significant
safety concerns about their potential misuse for generating inappropriate or
Not-Safe-For-Work (NSFW) contents, despite existing countermeasures such as
NSFW classifiers or model fine-tuning for inappropriate concept removal.
Addressing this challenge, our study unveils GuardT2I, a novel moderation
framework that adopts a generative approach to enhance T2I models' robustness
against adversarial prompts. Instead of making a binary classification,
GuardT2I utilizes a Large Language Model (LLM) to conditionally transform text
guidance embeddings within the T2I models into natural language for effective
adversarial prompt detection, without compromising the models' inherent
performance. Our extensive experiments reveal that GuardT2I outperforms leading
commercial solutions like OpenAI-Moderation and Microsoft Azure Moderator by a
significant margin across diverse adversarial scenarios.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要