3D Human Reconstruction in the Wild with Synthetic Data Using Generative Models
CoRR(2024)
摘要
In this work, we show that synthetic data created by generative models is
complementary to computer graphics (CG) rendered data for achieving remarkable
generalization performance on diverse real-world scenes for 3D human pose and
shape estimation (HPS). Specifically, we propose an effective approach based on
recent diffusion models, termed HumanWild, which can effortlessly generate
human images and corresponding 3D mesh annotations. We first collect a
large-scale human-centric dataset with comprehensive annotations, e.g., text
captions and surface normal images. Then, we train a customized ControlNet
model upon this dataset to generate diverse human images and initial
ground-truth labels. At the core of this step is that we can easily obtain
numerous surface normal images from a 3D human parametric model, e.g., SMPL-X,
by rendering the 3D mesh onto the image plane. As there exists inevitable noise
in the initial labels, we then apply an off-the-shelf foundation segmentation
model, i.e., SAM, to filter negative data samples. Our data generation pipeline
is flexible and customizable to facilitate different real-world tasks, e.g.,
ego-centric scenes and perspective-distortion scenes. The generated dataset
comprises 0.79M images with corresponding 3D annotations, covering versatile
viewpoints, scenes, and human identities. We train various HPS regressors on
top of the generated data and evaluate them on a wide range of benchmarks
(3DPW, RICH, EgoBody, AGORA, SSP-3D) to verify the effectiveness of the
generated data. By exclusively employing generative models, we generate
large-scale in-the-wild human images and high-quality annotations, eliminating
the need for real-world data collection.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要