VmambaIR: Visual State Space Model for Image Restoration
CoRR(2024)
摘要
Image restoration is a critical task in low-level computer vision, aiming to
restore high-quality images from degraded inputs. Various models, such as
convolutional neural networks (CNNs), generative adversarial networks (GANs),
transformers, and diffusion models (DMs), have been employed to address this
problem with significant impact. However, CNNs have limitations in capturing
long-range dependencies. DMs require large prior models and computationally
intensive denoising steps. Transformers have powerful modeling capabilities but
face challenges due to quadratic complexity with input image size. To address
these challenges, we propose VmambaIR, which introduces State Space Models
(SSMs) with linear complexity into comprehensive image restoration tasks. We
utilize a Unet architecture to stack our proposed Omni Selective Scan (OSS)
blocks, consisting of an OSS module and an Efficient Feed-Forward Network
(EFFN). Our proposed omni selective scan mechanism overcomes the unidirectional
modeling limitation of SSMs by efficiently modeling image information flows in
all six directions. Furthermore, we conducted a comprehensive evaluation of our
VmambaIR across multiple image restoration tasks, including image deraining,
single image super-resolution, and real-world image super-resolution. Extensive
experimental results demonstrate that our proposed VmambaIR achieves
state-of-the-art (SOTA) performance with much fewer computational resources and
parameters. Our research highlights the potential of state space models as
promising alternatives to the transformer and CNN architectures in serving as
foundational frameworks for next-generation low-level visual tasks.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要