SD-NeRF: Towards Lifelike Talking Head Animation Via Spatially-Adaptive Dual-Driven NeRFs

Shuai Shen,Wanhua Li,Xiaoke Huang,Zheng Zhu,Jie Zhou,Jiwen Lu

IEEE Transactions on Multimedia（2024）

引用 0|浏览12

暂无评分

摘要

Recent years have witnessed great progress in audio-driven talking head animation. Among these methods, the 3D-based ones better preserve the 3D consistency of the generated head and produce more natural results compared with 2D-based approaches. However, most 3D-based methods employ 3D morphable face models as the intermediate representation and involve multi-stage training, which may lead to error accumulation. To alleviate this problem, in this paper, we propose a fully end-to-end talking head animation method, which implicitly grasps the 3D structures by learning a conditional Neural Radiance Field (NeRF). As NeRF has proven to be an effective tool for 3D modeling, one can learn dynamic neural radiance fields conditioned on audio signals for talking head synthesis. Furthermore, we argue that audio signals cannot fully drive a lifelike talking head. When people are talking, they usually show many spontaneous facial movements like blinks and brow movements, which makes talkers natural and real. These movements cannot be fully driven by the audio signals since they are highly unrelated to the audio. Therefore, we incorporate motion information as another driving factor and develop an audio-motion dual-driven NeRF model to take a step toward more lifelike talking head synthesis. On this basis, as audio and motion mainly affect different regions of the human face, we propose a Spatially-adaptive Dual-driven NeRF (SD-NeRF), which fuses these two driven factors with a spatially-adaptive cross-attention mechanism. Quantitative and qualitative results demonstrate that, with finer facial controls, our method produces more realistic talking head videos compared with existing advanced works. For more video results, including the multi-view animation and cross audio-driven results, please refer to our demonstration video https://cloud.tsinghua.edu.cn/f/7ebd663951e5403da4a5/ .

查看译文

关键词

Attention mechanism,neural radiance fields,talking head video synthesis

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要