Uncovering the Text Embedding in Text-to-Image Diffusion Models
arxiv(2024)
摘要
The correspondence between input text and the generated image exhibits
opacity, wherein minor textual modifications can induce substantial deviations
in the generated image. While, text embedding, as the pivotal intermediary
between text and images, remains relatively underexplored. In this paper, we
address this research gap by delving into the text embedding space, unleashing
its capacity for controllable image editing and explicable semantic direction
attributes within a learning-free framework. Specifically, we identify two
critical insights regarding the importance of per-word embedding and their
contextual correlations within text embedding, providing instructive principles
for learning-free image editing. Additionally, we find that text embedding
inherently possesses diverse semantic potentials, and further reveal this
property through the lens of singular value decomposition (SVD). These
uncovered properties offer practical utility for image editing and semantic
discovery. More importantly, we expect the in-depth analyses and findings of
the text embedding can enhance the understanding of text-to-image diffusion
models.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要