Learning Omni-frequency Region-adaptive Representations for Real Image Super-Resolution

Xin Li
Xin Li
Tao Yu
Tao Yu
Yingxue Pang
Yingxue Pang
Simeng Sun
Simeng Sun
Cited by: 0|Bibtex|Views7
Other Links: arxiv.org
Weibo:
Extensive experiments on several benchmarks demonstrate the superiority of Omnifrequency Region-adaptive Network, and comprehensive ablation analysis verify the effectiveness of Frequency Decomposition and Region-adaptive Frequency Aggregation modules

Abstract:

Traditional single image super-resolution (SISR) methods that focus on solving single and uniform degradation (i.e., bicubic down-sampling), typically suffer from poor performance when applied into real-world low-resolution (LR) images due to the complicated realistic degradations. The key to solving this more challenging real image sup...More

Code:

Data:

0
Introduction
Highlights
  • With the development of deep learning, single image superresolution (SISR) has achieved great success either on PSNR values (Dong et al 2015; Haris, Shakhnarovich, and Ukita 2018; Kim, Kwon Lee, and Mu Lee 2016; Lim et al 2017; Zhang et al 2018a; Dai et al 2019; Mei et al 2020; Pan et al 2020) or on visual quality (Ledig et al 2017; Sajjadi, Scholkopf, and Hirsch 2017)
  • Based on our analysis, we propose an Omni-frequency Region-adaptive Network (OR-Net) for real image super-resolution (RealSR), which contains two technical novelties–1) Frequency Decomposition (FD) module that aims to achieve the LR image content separation in frequency domain and enhance texture details across all frequency components, 2) Regionadaptive Frequency Aggregation (RFA) module that aims to appropriately restore different frequency components for real HR images in different positions
  • To imitate the wavelet transform while avoiding key information lost, we propose to factorizes the mixed feature representations through the learnable latent-wise spatial downsampling, the similar operation can be found in the recentlyproposed octave convolution (OctConv) (Akbari et al 2020) but OctConv aims to reduce channel-wise redundancy like group or depth-wise convolutions
  • We provide the qualitative comparison of our Omnifrequency Region-adaptive Network (OR-Net) with state-of-the-art conventional SISR methods and RealSR method
  • We propose an Omni-frequency Regionadaptive Network (OR-Net) to enable effective real image super-resolution
  • Extensive experiments on several benchmarks demonstrate the superiority of OR-Net, and comprehensive ablation analysis verify the effectiveness of FD and Region-adaptive Frequency Aggregation (RFA) modules
Methods
  • Bicubic VDSR EDSR RDN DDBPN RCAN LP-KPN CDC OR-Net(Ours).
  • Category SISR RealSR Scale ×2 ×2 DRealSR PSNR SSIM LPIPS Scale ×3 ×3 Scale ×4 ×4 Embedding layer Tanh conv ReLU conv concat.
  • Basis kernel pool (i, j, :)
Conclusion
  • The authors propose an Omni-frequency Regionadaptive Network (OR-Net) to enable effective real image super-resolution.
  • To efficiently promise the learned feature representations of OR-Net are both informative and content-aware, the authors first start from the frequency perspective and design a Frequency Decomposition (FD) module to fully leverage the omni-frequency features to comprehensively enhance texture details for LR images.
  • Extensive experiments on several benchmarks demonstrate the superiority of OR-Net, and comprehensive ablation analysis verify the effectiveness of FD and RFA modules
Summary
  • Introduction:

    With the development of deep learning, single image superresolution (SISR) has achieved great success either on PSNR values (Dong et al 2015; Haris, Shakhnarovich, and Ukita 2018; Kim, Kwon Lee, and Mu Lee 2016; Lim et al 2017; Zhang et al 2018a; Dai et al 2019; Mei et al 2020; Pan et al 2020) or on visual quality (Ledig et al 2017; Sajjadi, Scholkopf, and Hirsch 2017).
  • Unlike the single and uniform synthetic degradation in SISR, the LR and HR images of RealSR are captured with digital single lens reflex (DSLR) cameras, which typically contains various/complex non-uniform real-world degradations, including blur, noise, and down-sampling
  • That is why those classic conventional SISR methods (RCAN (Zhang et al 2018a), EDSR (Lim et al 2017), SAN (Dai et al 2019) etc.) cannot handle the RealSR problem well
  • Methods:

    Bicubic VDSR EDSR RDN DDBPN RCAN LP-KPN CDC OR-Net(Ours).
  • Category SISR RealSR Scale ×2 ×2 DRealSR PSNR SSIM LPIPS Scale ×3 ×3 Scale ×4 ×4 Embedding layer Tanh conv ReLU conv concat.
  • Basis kernel pool (i, j, :)
  • Conclusion:

    The authors propose an Omni-frequency Regionadaptive Network (OR-Net) to enable effective real image super-resolution.
  • To efficiently promise the learned feature representations of OR-Net are both informative and content-aware, the authors first start from the frequency perspective and design a Frequency Decomposition (FD) module to fully leverage the omni-frequency features to comprehensively enhance texture details for LR images.
  • Extensive experiments on several benchmarks demonstrate the superiority of OR-Net, and comprehensive ablation analysis verify the effectiveness of FD and RFA modules
Tables
  • Table1: Quantitative results on the DRealSR dataset. We compare our OR-Net to the general SISR methods, including Bicubic, VDSR, EDSR, RDN, DDBPN, RCAN, and RealSR methods, including LP-KPN and CDC. We use PSNR, SSIM and LPIPS as evaluation metrics
  • Table2: Performance of different settings in OR-Net, where “bran. = i” means that there are i decomposed frequency branches in FD module of OR-Net
  • Table3: Ablation experiments conducted on DRealSR to study the effectiveness of the proposed RFA module and FEU design in our OR-Net
  • Table4: Quantitative results on the RealSR dataset of ×2 (<a class="ref-link" id="cWei_et+al_2020_a" href="#rWei_et+al_2020_a">Wei et al 2020</a>) comparison to the state-of-the-art methods in terms of PSNR, SSIM and LPIPS
  • Table5: Quantitative results on the traditional superresolution dataset, including Set5, Set14, and BSD100 of ×2 (<a class="ref-link" id="cWei_et+al_2020_a" href="#rWei_et+al_2020_a">Wei et al 2020</a>) comparison to the state-of-the-art methods in terms of PSNR, and SSIM
Download tables as Excel
Related work
  • Conventional Single Image Super-Resolution

    In the last decade, the traditional SISR has achieved great progress, especially for deep learning based approaches (Dong et al 2015; Dong, Loy, and Tang 2016; Lim et al 2017; Haris, Shakhnarovich, and Ukita 2018; Zhang et al 2018a; Dai et al 2019). These methods usually perform well on the the synthetic degradation (e.g., bicubic downsampling) but generalize poorly to realistic complicated distortions in real-world scenarios. This is problematic especially in practical applications, where the target scenes typically have hybrid/complex non-uniform degradations (e.g., blur, noise, and down-sampling), and also, there is always no readily available paired data for training.

    Real Image Super-Resolution

    RealSR has drawn more and more attention in recent years. Different from the general SISR that typically focuses on the simple and uniform synthetic degradation, RealSR mainly aims to solve these realistic complicated degradations in real-world scenarios. To capture the distortion of real scene, Chen et al (Chen et al 2019a) design two novel data acquisition strategies. Cai et al (Cai et al 2019) build a realworld super-resolution (RealSR) dataset by adjusting the focal length of a digital camera, and introduce the Laplacian pyramid based kernel prediction network (LP-KPN) to solve such non-uniform distortions. Recently, Wei et al (Wei et al 2020) present a large-scale diverse real-world image super-resolution dataset (DRealSR) and a component divide-and-conquer (CDC) model with gradient weighted (GW) loss, achieving great performance in RealSR.
Funding
  • This work was supported in part by NSFC under Grant U1908209, 61632001 and the National Key Research and Development Program of China 2018AAA0101400
Reference
  • Ahmed, N.; Natarajan, T.; and Rao, K. R. 1974. Discrete cosine transform. IEEE transactions on Computers 100(1): 90–93.
    Google ScholarLocate open access versionFindings
  • Akbari, M.; Liang, J.; Han, J.; and Tu, C. 2020. Generalized Octave Convolutions for Learned Multi-Frequency Image Compression. arXiv preprint arXiv:2002.10032.
    Findings
  • Cai, J.; Zeng, H.; Yong, H.; Cao, Z.; and Zhang, L. 2019. Toward real-world single image super-resolution: A new benchmark and a new model. In Proceedings of the IEEE International Conference on Computer Vision, 3086–3095.
    Google ScholarLocate open access versionFindings
  • Chen, C.; Xiong, Z.; Tian, X.; Zha, Z.-J.; and Wu, F. 2019a. Camera lens super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1652–1660.
    Google ScholarLocate open access versionFindings
  • Chen, Y.; Dai, X.; Liu, M.; Chen, D.; Yuan, L.; and Liu, Z. 2020. Dynamic convolution: Attention over convolution kernels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11030–11039.
    Google ScholarLocate open access versionFindings
  • Chen, Y.; Fan, H.; Xu, B.; Yan, Z.; Kalantidis, Y.; Rohrbach, M.; Yan, S.; and Feng, J. 2019b. Drop an octave: Reducing spatial redundancy in convolutional neural networks with octave convolution. In Proceedings of the IEEE International Conference on Computer Vision, 3435–3444.
    Google ScholarLocate open access versionFindings
  • Dai, T.; Cai, J.; Zhang, Y.; Xia, S.-T.; and Zhang, L. 2019. Second-order attention network for single image superresolution. In Proceedings of the IEEE conference on computer vision and pattern recognition, 11065–11074.
    Google ScholarLocate open access versionFindings
  • Dong, C.; Loy, C. C.; He, K.; and Tang, X. 2015. Image super-resolution using deep convolutional networks. IEEE transactions on pattern analysis and machine intelligence 38(2): 295–307.
    Google ScholarLocate open access versionFindings
  • Dong, C.; Loy, C. C.; and Tang, X. 2016. Accelerating the super-resolution convolutional neural network. In European conference on computer vision, 391–407. Springer.
    Google ScholarLocate open access versionFindings
  • Fritsche, M.; Gu, S.; and Timofte, R. 2019. Frequency separation for real-world super-resolution. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), 3599–3608. IEEE.
    Google ScholarLocate open access versionFindings
  • Fu, X.; Huang, J.; Ding, X.; Liao, Y.; and Paisley, J. 2017. Clearing the skies: A deep network architecture for singleimage rain removal. IEEE Transactions on Image Processing 26(6): 2944–2956.
    Google ScholarLocate open access versionFindings
  • Haris, M.; Shakhnarovich, G.; and Ukita, N. 2018. Deep back-projection networks for super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition, 1664–1673.
    Google ScholarLocate open access versionFindings
  • Hou, Q.; Zhang, L.; Cheng, M.-M.; and Feng, J. 2020. Strip Pooling: Rethinking Spatial Pooling for Scene Parsing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4003–4012.
    Google ScholarLocate open access versionFindings
  • Jia, X.; De Brabandere, B.; Tuytelaars, T.; and Gool, L. V. 2016. Dynamic filter networks. In Advances in neural information processing systems, 667–675.
    Google ScholarLocate open access versionFindings
  • Kim, D.-W.; Ryun Chung, J.; and Jung, S.-W. 2019. Grdn: Grouped residual dense network for real image denoising and gan-based real-world noise modeling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 0–0.
    Google ScholarLocate open access versionFindings
  • Kim, J.; Kwon Lee, J.; and Mu Lee, K. 20Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, 1646–1654.
    Google ScholarLocate open access versionFindings
  • Lai, W.-S.; Huang, J.-B.; Ahuja, N.; and Yang, M.-H. 2018. Fast and accurate image super-resolution with deep laplacian pyramid networks. IEEE transactions on pattern analysis and machine intelligence 41(11): 2599–2613.
    Google ScholarLocate open access versionFindings
  • Ledig, C.; Theis, L.; Huszar, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.; et al. 2017. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition, 4681–4690.
    Google ScholarLocate open access versionFindings
  • Li, X.; Jin, X.; Lin, J.; Liu, S.; Wu, Y.; Yu, T.; Zhou, W.; and Chen, Z. 2020. Learning Disentangled Feature Representation for Hybrid-distorted Image Restoration. In European Conference on Computer Vision, 313–329. Springer.
    Google ScholarLocate open access versionFindings
  • Li, X.; Wang, W.; Hu, X.; and Yang, J. 2019. Selective kernel networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, 510–519.
    Google ScholarLocate open access versionFindings
  • Lim, B.; Son, S.; Kim, H.; Nah, S.; and Mu Lee, K. 2017. Enhanced deep residual networks for single image superresolution. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 136–144.
    Google ScholarLocate open access versionFindings
  • Lin, X.; Ma, L.; Liu, W.; and Chang, S.-F. 2019. ContextGated Convolution. arXiv preprint arXiv:1910.05577.
    Findings
  • Mei, Y.; Fan, Y.; Zhou, Y.; Huang, L.; Huang, T. S.; and Shi, H. 2020. Image Super-Resolution With Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
    Google ScholarLocate open access versionFindings
  • Mildenhall, B.; Barron, J. T.; Chen, J.; Sharlet, D.; Ng, R.; and Carroll, R. 2018. Burst denoising with kernel prediction networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2502–2510.
    Google ScholarLocate open access versionFindings
  • Niklaus, S.; Mai, L.; and Liu, F. 2017. Video frame interpolation via adaptive separable convolution. In Proceedings of the IEEE International Conference on Computer Vision, 261–270.
    Google ScholarLocate open access versionFindings
  • Pan, J.; Liu, Y.; Sun, D.; Ren, J. S.; Cheng, M.-M.; Yang, J.; and Tang, J. 2020. Image Formation Model Guided Deep Image Super-Resolution. In AAAI, 11807–11814.
    Google ScholarLocate open access versionFindings
  • Pang, Y.; Li, X.; Jin, X.; Wu, Y.; Liu, J.; Liu, S.; and Chen, Z. 2020. FAN: Frequency Aggregation Network for Real Image Super-resolution. arXiv preprint arXiv:2009.14547.
    Findings
  • Rao, R. 2002. Wavelet transforms. Encyclopedia of imaging science and technology.
    Google ScholarFindings
  • Sajjadi, M. S.; Scholkopf, B.; and Hirsch, M. 2017. Enhancenet: Single image super-resolution through automated texture synthesis. In Proceedings of the IEEE International Conference on Computer Vision, 4491–4500.
    Google ScholarLocate open access versionFindings
  • Tai, Y.; Yang, J.; Liu, X.; and Xu, C. 2017. Memnet: A persistent memory network for image restoration. In Proceedings of the IEEE international conference on computer vision, 4539–4547.
    Google ScholarLocate open access versionFindings
  • Wang, X.; Girshick, R.; Gupta, A.; and He, K. 2018. Nonlocal neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, 7794– 7803.
    Google ScholarLocate open access versionFindings
  • Wei, P.; Xie, Z.; Lu, H.; Zhan, Z.; Ye, Q.; Zuo, W.; and Lin, L. 2020. Component Divide-and-Conquer for Real-World Image Super-Resolution. arXiv preprint arXiv:2008.01928.
    Findings
  • Woo, S.; Park, J.; Lee, J.-Y.; and So Kweon, I. 2018. Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV), 3–19.
    Google ScholarLocate open access versionFindings
  • Xiao, M.; Zheng, S.; Liu, C.; Wang, Y.; He, D.; Ke, G.; Bian, J.; Lin, Z.; and Liu, T.-Y. 2020. Invertible Image Rescaling. ECCV.
    Google ScholarLocate open access versionFindings
  • Yu, T.; Guo, Z.; Jin, X.; Wu, S.; Chen, Z.; Li, W.; Zhang, Z.; and Liu, S. 2020. Region Normalization for Image Inpainting. In AAAI, 12733–12740.
    Google ScholarFindings
  • Zhang, K.; Zuo, W.; and Zhang, L. 2018. Learning a single convolutional super-resolution network for multiple degradations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3262–3271.
    Google ScholarLocate open access versionFindings
  • Zhang, Y.; Li, K.; Li, K.; Wang, L.; Zhong, B.; and Fu, Y. 2018a. Image super-resolution using very deep residual channel attention networks. In Proceedings of the European Conference on Computer Vision (ECCV), 286–301.
    Google ScholarLocate open access versionFindings
  • Zhang, Y.; Tian, Y.; Kong, Y.; Zhong, B.; and Fu, Y. 2018b. Residual dense network for image super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2472–2481.
    Google ScholarLocate open access versionFindings
  • Zhang, Z.; Lan, C.; Zeng, W.; Jin, X.; and Chen, Z. 2020. Relation-Aware Global Attention for Person Reidentification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3186–3195.
    Google ScholarLocate open access versionFindings
  • Zhao, F.; Zhao, J.; Yan, S.; and Feng, J. 2018. Dynamic conditional networks for few-shot learning. In Proceedings of the European Conference on Computer Vision (ECCV), 19–35.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments