Motion-Zero: Zero-Shot Moving Object Control Framework for Diffusion-Based Video Generation
CoRR(2024)
摘要
Recent large-scale pre-trained diffusion models have demonstrated a powerful
generative ability to produce high-quality videos from detailed text
descriptions. However, exerting control over the motion of objects in videos
generated by any video diffusion model is a challenging problem. In this paper,
we propose a novel zero-shot moving object trajectory control framework,
Motion-Zero, to enable a bounding-box-trajectories-controlled text-to-video
diffusion model.To this end, an initial noise prior module is designed to
provide a position-based prior to improve the stability of the appearance of
the moving object and the accuracy of position. In addition, based on the
attention map of the U-net, spatial constraints are directly applied to the
denoising process of diffusion models, which further ensures the positional and
spatial consistency of moving objects during the inference. Furthermore,
temporal consistency is guaranteed with a proposed shift temporal attention
mechanism. Our method can be flexibly applied to various state-of-the-art video
diffusion models without any training process. Extensive experiments
demonstrate our proposed method can control the motion trajectories of objects
and generate high-quality videos.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要