Doubly Abductive Counterfactual Inference for Text-based Image Editing
CVPR 2024(2024)
摘要
We study text-based image editing (TBIE) of a single image by counterfactual
inference because it is an elegant formulation to precisely address the
requirement: the edited image should retain the fidelity of the original one.
Through the lens of the formulation, we find that the crux of TBIE is that
existing techniques hardly achieve a good trade-off between editability and
fidelity, mainly due to the overfitting of the single-image fine-tuning. To
this end, we propose a Doubly Abductive Counterfactual inference framework
(DAC). We first parameterize an exogenous variable as a UNet LoRA, whose
abduction can encode all the image details. Second, we abduct another exogenous
variable parameterized by a text encoder LoRA, which recovers the lost
editability caused by the overfitted first abduction. Thanks to the second
abduction, which exclusively encodes the visual transition from post-edit to
pre-edit, its inversion – subtracting the LoRA – effectively reverts pre-edit
back to post-edit, thereby accomplishing the edit. Through extensive
experiments, our DAC achieves a good trade-off between editability and
fidelity. Thus, we can support a wide spectrum of user editing intents,
including addition, removal, manipulation, replacement, style transfer, and
facial change, which are extensively validated in both qualitative and
quantitative evaluations. Codes are in https://github.com/xuesong39/DAC.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要