CoMo: Controllable Motion Generation through Language Guided Pose Code Editing
arxiv(2024)
摘要
Text-to-motion models excel at efficient human motion generation, but
existing approaches lack fine-grained controllability over the generation
process. Consequently, modifying subtle postures within a motion or inserting
new actions at specific moments remains a challenge, limiting the applicability
of these methods in diverse scenarios. In light of these challenges, we
introduce CoMo, a Controllable Motion generation model, adept at accurately
generating and editing motions by leveraging the knowledge priors of large
language models (LLMs). Specifically, CoMo decomposes motions into discrete and
semantically meaningful pose codes, with each code encapsulating the semantics
of a body part, representing elementary information such as "left knee slightly
bent". Given textual inputs, CoMo autoregressively generates sequences of pose
codes, which are then decoded into 3D motions. Leveraging pose codes as
interpretable representations, an LLM can directly intervene in motion editing
by adjusting the pose codes according to editing instructions. Experiments
demonstrate that CoMo achieves competitive performance in motion generation
compared to state-of-the-art models while, in human studies, CoMo substantially
surpasses previous work in motion editing abilities.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要