Speech Driven Tongue Animation

Salvador Medina,Denis Tome,Carsten Stoll,Mark Tiede,Kevin Munhall,Alexander G. Hauptmann,Iain Matthews

IEEE Conference on Computer Vision and Pattern Recognition（2022）

引用 7|浏览36

暂无评分

摘要

Advances in speech driven animation techniques allow the creation of convincing animations for virtual characters solely from audio data. Many existing approaches focus on facial and lip motion and they often do not provide realistic animation of the inner mouth. This paper addresses the problem of speech-driven inner mouth animation. Obtaining performance capture data of the tongue and jaw from video alone is difficult because the inner mouth is only partially observable during speech. In this work, we introduce a large-scale speech and mocap dataset that focuses on capturing tongue, jaw, and lip motion. This dataset enables research using data-driven techniques to generate realistic inner mouth animation from speech. We then propose a deep-learning based method for accurate and generalizable speech to tongue and jaw animation, and evaluate several encoder-decoder network architectures and audio feature encoders. We find that recent self-supervised deep learning based audio feature encoders are robust, generalize well to unseen speakers and content, and work best for our task. To demonstrate the practical application of our approach, we show animations on high-quality parametric 3D face models driven by the landmarks generated from our speech-to-tongue animation method.

查看译文

关键词

Face and gestures, Vision + graphics

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要