LDEdit: Towards Generalized Text Guided Image Manipulation Via Latent Diffusion Models

Paramanand Chandramouli,Kanchana Vaishnavi Gandikota

arXiv (Cornell University)（2022）

引用 0|浏览11

暂无评分

摘要

Research in vision-language models has seen rapid developments off-late,enabling natural language-based interfaces for image generation andmanipulation. Many existing text guided manipulation techniques are restrictedto specific classes of images, and often require fine-tuning to transfer to adifferent style or domain. Nevertheless, generic image manipulation using asingle model with flexible text inputs is highly desirable. Recent workaddresses this task by guiding generative models trained on the generic imagedatasets using pretrained vision-language encoders. While promising, thisapproach requires expensive optimization for each input. In this work, wepropose an optimization-free method for the task of generic image manipulationfrom text prompts. Our approach exploits recent Latent Diffusion Models (LDM)for text to image generation to achieve zero-shot text guided manipulation. Weemploy a deterministic forward diffusion in a lower dimensional latent space,and the desired manipulation is achieved by simply providing the target text tocondition the reverse diffusion process. We refer to our approach as LDEdit. Wedemonstrate the applicability of our method on semantic image manipulation andartistic style transfer. Our method can accomplish image manipulation ondiverse domains and enables editing multiple attributes in a straightforwardfashion. Extensive experiments demonstrate the benefit of our approach overcompeting baselines.

查看译文

关键词

Image Captioning,Visual Question Answering,Language Understanding

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要