Neural Descent for Visual 3D Human Pose and Shape

Andrei Zanfir,Eduard Gabriel Bazavan,Mihai Zanfir,William T. Freeman,Rahul Sukthankar,Cristian Sminchisescu

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021（2021）

引用 60|浏览226

暂无评分

摘要

We present deep neural network methodology to reconstruct the 3d pose and shape of people, including hand gestures and facial expression, given an input RGB image. We rely on a recently introduced, expressive full body statistical 3d human model, GHUM, trained end-to-end, and learn to reconstruct its pose and shape state in a self-supervised regime. Central to our methodology, is a learning to learn and optimize approach, referred to as HUman Neural Descent (HUND), which avoids both second-order differentiation when training the model parameters, and expensive state gradient descent in order to accurately minimize a semantic differentiable rendering loss at test time. Instead, we rely on novel recurrent stages to update the pose and shape parameters such that not only losses are minimized effectively, but the process is meta-regularized in order to ensure endprogress. HUND's symmetry between training and testing makes it the first 3d human sensing architecture to natively support different operating regimes including self-supervised ones. In diverse tests, we show that HUND achieves very competitive results in datasets like H3.6M and 3DPW, as well as good quality 3d reconstructions for complex imagery collected in-the-wild.

查看译文

关键词

deep neural network methodology,hand gestures,facial expression,input RGB image,self-supervised regime,HUman Neural Descent,second-order differentiation,expensive state gradient descent,semantic differentiable rendering loss,shape parameters,H3.6M,3DPW,3D human sensing architecture,HUND symmetry,expressive full body statistical 3D human model,visual 3D human pose,3D reconstructions

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要