Discriminative Probing and Tuning for Text-to-Image Generation
CVPR 2024(2024)
摘要
Despite advancements in text-to-image generation (T2I), prior methods often
face text-image misalignment problems such as relation confusion in generated
images. Existing solutions involve cross-attention manipulation for better
compositional understanding or integrating large language models for improved
layout planning. However, the inherent alignment capabilities of T2I models are
still inadequate. By reviewing the link between generative and discriminative
modeling, we posit that T2I models' discriminative abilities may reflect their
text-image alignment proficiency during generation. In this light, we advocate
bolstering the discriminative abilities of T2I models to achieve more precise
text-to-image alignment for generation. We present a discriminative adapter
built on T2I models to probe their discriminative abilities on two
representative tasks and leverage discriminative fine-tuning to improve their
text-image alignment. As a bonus of the discriminative adapter, a
self-correction mechanism can leverage discriminative gradients to better align
generated images to text prompts during inference. Comprehensive evaluations
across three benchmark datasets, including both in-distribution and
out-of-distribution scenarios, demonstrate our method's superior generation
performance. Meanwhile, it achieves state-of-the-art discriminative performance
on the two discriminative tasks compared to other generative models.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要