EchoReel: Enhancing Action Generation of Existing Video Diffusion Models
CoRR(2024)
Abstract
Recent large-scale video datasets have facilitated the generation of diverse
open-domain videos of Video Diffusion Models (VDMs). Nonetheless, the efficacy
of VDMs in assimilating complex knowledge from these datasets remains
constrained by their inherent scale, leading to suboptimal comprehension and
synthesis of numerous actions. In this paper, we introduce EchoReel, a novel
approach to augment the capability of VDMs in generating intricate actions by
emulating motions from pre-existing videos, which are readily accessible from
databases or online repositories. EchoReel seamlessly integrates with existing
VDMs, enhancing their ability to produce realistic motions without compromising
their fundamental capabilities. Specifically, the Action Prism (AP), is
introduced to distill motion information from reference videos, which requires
training on only a small dataset. Leveraging the knowledge from pre-trained
VDMs, EchoReel incorporates new action features into VDMs through the
additional layers, eliminating the need for any further fine-tuning of
untrained actions. Extensive experiments demonstrate that EchoReel is not
merely replicating the whole content from references, and it significantly
improves the generation of realistic actions, even in situations where existing
VDMs might directly fail.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined