Video frame interpolation via spatial multi-scale modelling

Zhe Qu, Weijing Liu,Lizhen Cui,Xiaohui Yang

IET COMPUTER VISION(2024)

引用 0|浏览0
暂无评分
摘要
Video frame interpolation (VFI) is a technique that synthesises intermediate frames between adjacent original video frames to enhance the temporal super-resolution of the video. However, existing methods usually rely on heavy model architectures with a large number of parameters. The authors introduce an efficient VFI network based on multiple lightweight convolutional units and a Local three-scale encoding (LTSE) structure. In particular, the authors introduce a LTSE structure with two-level attention cascades. This design is tailored to enhance the efficient capture of details and contextual information across diverse scales in images. Secondly, the authors introduce recurrent convolutional layers (RCL) and residual operations, designing the recurrent residual convolutional unit to optimise the LTSE structure. Additionally, a lightweight convolutional unit named separable recurrent residual convolutional unit is introduced to reduce the model parameters. Finally, the authors obtain the three-scale decoding features from the decoder and warp them for a set of three-scale pre-warped maps. The authors fuse them into the synthesis network to generate high-quality interpolated frames. The experimental results indicate that the proposed approach achieves superior performance with fewer model parameters. This is a revised version of the authors' manuscript, which proposes a lightweight VFI network based on multiple lightweight convolutional units and the three-scale encoding-decoding structure. The proposed model learns features in an adaptive method to ensure an effective inference of motion information. Moreover, the authors design a lightweight convolutional unit S_RRCU to decrease the model parameters. image
更多
查看译文
关键词
computer vision,image motion analysis,learning (artificial intelligence),video signal processing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要