Diffusion^2: Dynamic 3D Content Generation via Score Composition of Orthogonal Diffusion Models
arxiv(2024)
摘要
Recent advancements in 3D generation are predominantly propelled by
improvements in 3D-aware image diffusion models which are pretrained on
Internet-scale image data and fine-tuned on massive 3D data, offering the
capability of producing highly consistent multi-view images. However, due to
the scarcity of synchronized multi-view video data, it is impractical to adapt
this paradigm to 4D generation directly. Despite that, the available video and
3D data are adequate for training video and multi-view diffusion models that
can provide satisfactory dynamic and geometric priors respectively. In this
paper, we present Diffusion^2, a novel framework for dynamic 3D content
creation that leverages the knowledge about geometric consistency and temporal
smoothness from these models to directly sample dense multi-view and
multi-frame images which can be employed to optimize continuous 4D
representation. Specifically, we design a simple yet effective denoising
strategy via score composition of video and multi-view diffusion models based
on the probability structure of the images to be generated. Owing to the high
parallelism of the image generation and the efficiency of the modern 4D
reconstruction pipeline, our framework can generate 4D content within few
minutes. Furthermore, our method circumvents the reliance on 4D data, thereby
having the potential to benefit from the scalability of the foundation video
and multi-view diffusion models. Extensive experiments demonstrate the efficacy
of our proposed framework and its capability to flexibly adapt to various types
of prompts.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要