Generating Human Interaction Motions in Scenes with Text Control
CoRR(2024)
摘要
We present TeSMo, a method for text-controlled scene-aware motion generation
based on denoising diffusion models. Previous text-to-motion methods focus on
characters in isolation without considering scenes due to the limited
availability of datasets that include motion, text descriptions, and
interactive scenes. Our approach begins with pre-training a scene-agnostic
text-to-motion diffusion model, emphasizing goal-reaching constraints on
large-scale motion-capture datasets. We then enhance this model with a
scene-aware component, fine-tuned using data augmented with detailed scene
information, including ground plane and object shapes. To facilitate training,
we embed annotated navigation and interaction motions within scenes. The
proposed method produces realistic and diverse human-object interactions, such
as navigation and sitting, in different scenes with various object shapes,
orientations, initial body positions, and poses. Extensive experiments
demonstrate that our approach surpasses prior techniques in terms of the
plausibility of human-scene interactions, as well as the realism and variety of
the generated motions. Code will be released upon publication of this work at
https://research.nvidia.com/labs/toronto-ai/tesmo.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要