Coherent 3D Portrait Video Reconstruction via Triplane Fusion
arxiv(2024)
摘要
Recent breakthroughs in single-image 3D portrait reconstruction have enabled
telepresence systems to stream 3D portrait videos from a single camera in
real-time, potentially democratizing telepresence. However, per-frame 3D
reconstruction exhibits temporal inconsistency and forgets the user's
appearance. On the other hand, self-reenactment methods can render coherent 3D
portraits by driving a personalized 3D prior, but fail to faithfully
reconstruct the user's per-frame appearance (e.g., facial expressions and
lighting). In this work, we recognize the need to maintain both coherent
identity and dynamic per-frame appearance to enable the best possible realism.
To this end, we propose a new fusion-based method that fuses a personalized 3D
subject prior with per-frame information, producing temporally stable 3D videos
with faithful reconstruction of the user's per-frame appearances. Trained only
using synthetic data produced by an expression-conditioned 3D GAN, our
encoder-based method achieves both state-of-the-art 3D reconstruction accuracy
and temporal consistency on in-studio and in-the-wild datasets.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要