3D Facial Expressions through Analysis-by-Neural-Synthesis
CVPR 2024(2024)
摘要
While existing methods for 3D face reconstruction from in-the-wild images
excel at recovering the overall face shape, they commonly miss subtle, extreme,
asymmetric, or rarely observed expressions. We improve upon these methods with
SMIRK (Spatial Modeling for Image-based Reconstruction of Kinesics), which
faithfully reconstructs expressive 3D faces from images. We identify two key
limitations in existing methods: shortcomings in their self-supervised training
formulation, and a lack of expression diversity in the training images. For
training, most methods employ differentiable rendering to compare a predicted
face mesh with the input image, along with a plethora of additional loss
functions. This differentiable rendering loss not only has to provide
supervision to optimize for 3D face geometry, camera, albedo, and lighting,
which is an ill-posed optimization problem, but the domain gap between
rendering and input image further hinders the learning process. Instead, SMIRK
replaces the differentiable rendering with a neural rendering module that,
given the rendered predicted mesh geometry, and sparsely sampled pixels of the
input image, generates a face image. As the neural rendering gets color
information from sampled image pixels, supervising with neural rendering-based
reconstruction loss can focus solely on the geometry. Further, it enables us to
generate images of the input identity with varying expressions while training.
These are then utilized as input to the reconstruction model and used as
supervision with ground truth geometry. This effectively augments the training
data and enhances the generalization for diverse expressions. Our qualitative,
quantitative and particularly our perceptual evaluations demonstrate that SMIRK
achieves the new state-of-the art performance on accurate expression
reconstruction. Project webpage: https://georgeretsi.github.io/smirk/.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要