ComFusion: Personalized Subject Generation in Multiple Specific Scenes From Single Image
CoRR(2024)
摘要
Recent advancements in personalizing text-to-image (T2I) diffusion models
have shown the capability to generate images based on personalized visual
concepts using a limited number of user-provided examples. However, these
models often struggle with maintaining high visual fidelity, particularly in
manipulating scenes as defined by textual inputs. Addressing this, we introduce
ComFusion, a novel approach that leverages pretrained models generating
composition of a few user-provided subject images and predefined-text scenes,
effectively fusing visual-subject instances with textual-specific scenes,
resulting in the generation of high-fidelity instances within diverse scenes.
ComFusion integrates a class-scene prior preservation regularization, which
leverages composites the subject class and scene-specific knowledge from
pretrained models to enhance generation fidelity. Additionally, ComFusion uses
coarse generated images, ensuring they align effectively with both the instance
image and scene texts. Consequently, ComFusion maintains a delicate balance
between capturing the essence of the subject and maintaining scene
fidelity.Extensive evaluations of ComFusion against various baselines in T2I
personalization have demonstrated its qualitative and quantitative superiority.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要