Synthesizing 3D Trump: Predicting and Visualizing the Relationship Between Text, Speech, and Articulatory Movements.

Jun Yu,Qiang Ling,Changwei Luo,Chang Wen Chen

IEEE/ACM transactions on audio, speech, and language processing（2019）

引用 1|浏览52

暂无评分

摘要

The movements of articulators, such as lips, tongue and teeth, play an important role in increasing the language expression capability by unmasking the information hid in text or speech. Hence, it is necessary to deeply mine and visualize the relationship between text, speech and articulatory movements for understanding language in multi-modality and multi-level. As a case study, given text and audio of President Donald John Trump, this paper synthesizes a high quality 3D animation of him speaking with accurate synchronicity between speech and articulators. First, visual co-articulation is modeled by predicting the mapping from textspeech to articulatory movements. Then, based on a reconstructed 3D head model, physiological characteristics and statistical learning are combined to visualize each phoneme. Finally, the visualization results of consecutive phonemes are fused by visual co-articulation model to generate synchronized articulatory animations. Experiments show that the system can not only produce photo-realistic results in front but also distinguish the visual differences among phonemes from unconstrained views.

查看译文

关键词

Animation,Visualization,Feature extraction,Head,Acoustics,Linguistics,Three-dimensional displays,Visual co-articulation,speech animation

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要