Video Interpolation with Diffusion Models
CVPR 2024(2024)
摘要
We present VIDIM, a generative model for video interpolation, which creates
short videos given a start and end frame. In order to achieve high fidelity and
generate motions unseen in the input data, VIDIM uses cascaded diffusion models
to first generate the target video at low resolution, and then generate the
high-resolution video conditioned on the low-resolution generated video. We
compare VIDIM to previous state-of-the-art methods on video interpolation, and
demonstrate how such works fail in most settings where the underlying motion is
complex, nonlinear, or ambiguous while VIDIM can easily handle such cases. We
additionally demonstrate how classifier-free guidance on the start and end
frame and conditioning the super-resolution model on the original
high-resolution frames without additional parameters unlocks high-fidelity
results. VIDIM is fast to sample from as it jointly denoises all the frames to
be generated, requires less than a billion parameters per diffusion model to
produce compelling results, and still enjoys scalability and improved quality
at larger parameter counts.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要