GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting
arxiv(2024)
摘要
Recent works on audio-driven talking head synthesis using Neural Radiance
Fields (NeRF) have achieved impressive results. However, due to inadequate pose
and expression control caused by NeRF implicit representation, these methods
still have some limitations, such as unsynchronized or unnatural lip movements,
and visual jitter and artifacts. In this paper, we propose GaussianTalker, a
novel method for audio-driven talking head synthesis based on 3D Gaussian
Splatting. With the explicit representation property of 3D Gaussians, intuitive
control of the facial motion is achieved by binding Gaussians to 3D facial
models. GaussianTalker consists of two modules, Speaker-specific Motion
Translator and Dynamic Gaussian Renderer. Speaker-specific Motion Translator
achieves accurate lip movements specific to the target speaker through
universalized audio feature extraction and customized lip motion generation.
Dynamic Gaussian Renderer introduces Speaker-specific BlendShapes to enhance
facial detail representation via a latent pose, delivering stable and realistic
rendered videos. Extensive experimental results suggest that GaussianTalker
outperforms existing state-of-the-art methods in talking head synthesis,
delivering precise lip synchronization and exceptional visual quality. Our
method achieves rendering speeds of 130 FPS on NVIDIA RTX4090 GPU,
significantly exceeding the threshold for real-time rendering performance, and
can potentially be deployed on other hardware platforms.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要