Generation of speech and facial animation with controllable articulatory effort for amusing conversational characters

Joakim Gustafson,Eva Szekely,Jonas Beskow

PROCEEDINGS OF THE 23RD ACM INTERNATIONAL CONFERENCE ON INTELLIGENT VIRTUAL AGENTS, IVA 2023(2023)

引用 0|浏览5
暂无评分
摘要
Engaging embodied conversational agents need to generate expressive behavior in order to be believable in socializing interactions. We present a system that can generate spontaneous speech with supporting lip movements. The neural conversational TTS voice is trained on a multi-style speech corpus that has been prosodically tagged (pitch and speaking rate) and transcribed (including tokens for breathing, fillers and laughter). We introduce a speech animation algorithm where articulatory effort can be adjusted. The facial animation is driven by time-stamped phonemes and prominence estimates from the synthesised speech waveform to modulate the lipand jaw movements accordingly. In objective evaluations we show that the system is able to generate speech and facial animation that vary in articulation effort. In subjective evaluations we compare our conversational TTS system's capability to deliver jokes with a commercial TTS. Both system succeeded equally good.
更多
查看译文
关键词
ECAs,speech synthesis,facial animation,humour generation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要