
Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture Generation

2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)(2024)

引用 0|浏览114
Generating vivid and emotional 3D co-speech gestures is crucial for virtualavatar animation in human-machine interaction applications. While the existingmethods enable generating the gestures to follow a single emotion label, theyoverlook that long gesture sequence modeling with emotion transition is morepractical in real scenes. In addition, the lack of large-scale availabledatasets with emotional transition speech and corresponding 3D human gesturesalso limits the addressing of this task. To fulfill this goal, we firstincorporate the ChatGPT-4 and an audio inpainting approach to construct thehigh-fidelity emotion transition human speeches. Considering obtaining therealistic 3D pose annotations corresponding to the dynamically inpaintedemotion transition audio is extremely difficult, we propose a novel weaklysupervised training strategy to encourage authority gesture transitions.Specifically, to enhance the coordination of transition gestures w.r.tdifferent emotional ones, we model the temporal association representationbetween two different emotional gesture sequences as style guidance and infuseit into the transition generation. We further devise an emotion mixturemechanism that provides weak supervision based on a learnable mixed emotionlabel for transition gestures. Last, we present a keyframe sampler to supplyeffective initial posture cues in long sequences, enabling us to generatediverse gestures. Extensive experiments demonstrate that our method outperformsthe state-of-the-art models constructed by adapting single emotion-conditionedcounterparts on our newly defined emotion transition task and datasets. Ourcode and dataset will be released on the project page:
Co-speech Gestures,Emotional Transition,Gesture Generation,Real Scenes,Human-machine Interaction,Human Speech,Inpainting,3D Pose,Weak Supervision,Human Gestures,Single Emotion,Disgust,Upper Body,Matrix Multiplication,Temporal Correlation,Head And Tail,3D Position,Emotion Categories,Human Motion,3D Motion,Temporal Coherence,3D Human Model
AI 理解论文
Chat Paper