Raformer: Redundancy-Aware Transformer for Video Wire Inpainting
arxiv(2024)
摘要
Video Wire Inpainting (VWI) is a prominent application in video inpainting,
aimed at flawlessly removing wires in films or TV series, offering significant
time and labor savings compared to manual frame-by-frame removal. However, wire
removal poses greater challenges due to the wires being longer and slimmer than
objects typically targeted in general video inpainting tasks, and often
intersecting with people and background objects irregularly, which adds
complexity to the inpainting process. Recognizing the limitations posed by
existing video wire datasets, which are characterized by their small size, poor
quality, and limited variety of scenes, we introduce a new VWI dataset with a
novel mask generation strategy, namely Wire Removal Video Dataset 2 (WRV2) and
Pseudo Wire-Shaped (PWS) Masks. WRV2 dataset comprises over 4,000 videos with
an average length of 80 frames, designed to facilitate the development and
efficacy of inpainting models. Building upon this, our research proposes the
Redundancy-Aware Transformer (Raformer) method that addresses the unique
challenges of wire removal in video inpainting. Unlike conventional approaches
that indiscriminately process all frame patches, Raformer employs a novel
strategy to selectively bypass redundant parts, such as static background
segments devoid of valuable information for inpainting. At the core of Raformer
is the Redundancy-Aware Attention (RAA) module, which isolates and accentuates
essential content through a coarse-grained, window-based attention mechanism.
This is complemented by a Soft Feature Alignment (SFA) module, which refines
these features and achieves end-to-end feature alignment. Extensive experiments
on both the traditional video inpainting datasets and our proposed WRV2 dataset
demonstrate that Raformer outperforms other state-of-the-art methods.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要