Free-Editor: Zero-shot Text-driven 3D Scene Editing
CoRR(2023)
摘要
Text-to-Image (T2I) diffusion models have gained popularity recently due to
their multipurpose and easy-to-use nature, e.g. image and video generation as
well as editing. However, training a diffusion model specifically for 3D scene
editing is not straightforward due to the lack of large-scale datasets. To
date, editing 3D scenes requires either re-training the model to adapt to
various 3D edited scenes or design-specific methods for each special editing
type. Furthermore, state-of-the-art (SOTA) methods require multiple
synchronized edited images from the same scene to facilitate the scene editing.
Due to the current limitations of T2I models, it is very challenging to apply
consistent editing effects to multiple images, i.e. multi-view inconsistency in
editing. This in turn compromises the desired 3D scene editing performance if
these images are used. In our work, we propose a novel training-free 3D scene
editing technique, Free-Editor, which allows users to edit 3D scenes without
further re-training the model during test time. Our proposed method
successfully avoids the multi-view style inconsistency issue in SOTA methods
with the help of a "single-view editing" scheme. Specifically, we show that
editing a particular 3D scene can be performed by only modifying a single view.
To this end, we introduce an Edit Transformer that enforces intra-view
consistency and inter-view style transfer by utilizing self- and
cross-attention, respectively. Since it is no longer required to re-train the
model and edit every view in a scene, the editing time, as well as memory
resources, are reduced significantly, e.g., the runtime being $\sim \textbf{20}
\times$ faster than SOTA. We have conducted extensive experiments on a wide
range of benchmark datasets and achieve diverse editing capabilities with our
proposed technique.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要