Learned Block-based Hybrid Image Compression

Cited by: 0|Bibtex|Views21
Other Links: arxiv.org
Weibo:
Block partition brings the possibility of acceleration, contextual prediction module effectively utilizes the correlation between blocks to improve coding efficiency, and boundary-aware post-processing module takes into account block effect to improve subjective and objective qua...

Abstract:

Learned image compression based on neural networks have made huge progress thanks to its superiority in learning better representation through non-linear transformation. Different from traditional hybrid coding frameworks, that are commonly block-based, existing learned image codecs usually process the images in a full-resolution manner...More

Code:

Data:

0
Introduction
  • Image compression is an essential technique to reduce the overhead of image transmission and storage.
  • Learned image compression methods are commonly based on the transformation coding frameworks consisting of transformation, quantization, and entropy modeling.
  • For learned image compression methods, probabilistic-based entropy modeling is an essential part, which includes hyperprior modeling (Balleet al.
  • The autoregressive context module estimates the probability based on the previously decoded elements in the latent space, which can achieve a bit-rate saving of 15.87%.
  • The context modeling brings important benefits in compression, it greatly increases the time complexity, especially in the decoding part.
  • The entire decoding step must be processed in sequence
Highlights
  • Image compression is an essential technique to reduce the overhead of image transmission and storage
  • With the significant progress of artificial neural networks, learned image compression methods have attracted a lot of attention
  • Unlike existing learned compression methods based on transformation coding, we propose a learned block-based hybrid image compression (LBHIC) method, which introduces contextual prediction module (CPM) to utilize the relationship between adjacent blocks and proposes boundary-aware post-processing module (BPM) to remove the block artifacts
  • Different from existing learned image compression methods that follow the transformation coding framework, we explore a leaned block-based hybrid image compression framework (LBHIC), which integrates the predictive coding to remove the redundancy between adjacent blocks and introduces the post-process module to reduce block artifacts
  • 3.3 Boundary-aware Post-processing Module To remove the block artifacts caused by the padding operation in edges, we introduce the concept of boundary mask to guide the post-processing module, and further design the boundary-aware post-processing module (BPM)
  • Block partition brings the possibility of acceleration, CPM effectively utilizes the correlation between blocks to improve coding efficiency, and BPM takes into account block effect to improve subjective and objective quality
Methods
  • 3.1 Overview of the Framework

    The overall framework is shown in Figure 2.
  • 3.1 Overview of the Framework.
  • The overall framework is shown in Figure 2.
  • The authors utilize predictive coding to reduce the redundancy between adjacent decoded blocks.
  • Xi,j denotes the block in i-th row and j-th column, and Xi,j denotes the corresponding decoded block, the authors can calculate the residual information Ri,j as: Ri,j = Xi,j − gpred(Xi−1,j , Xi,j−1; θp)
Results
  • 4.1 Experimental Setup

    Datasets: The authors use Imagenet (Deng et al 2009) as the training set for the first three stages of model training, while DIV2K dataset (Agustsson and Timofte 2017) and CLIC 2020 challenge training set are utilized for the fourth training stage.
  • As shown in Figure 6 and Figure 7, the authors visualize RD curves of the LBHIC model and the SOTA methods for image compression in terms of PSNR and MS-SSIM metrics on the Kodak dataset, respectively.
  • Compared with latest traditional codec VTM 8.0, the LBHIC framework can achieve about 0.3dB gain in PSNR at all rate points.
  • To further verify the generalization performance of the model, the authors tested the model on the Tecnick dataset, and results are shown in supplementary materials.
  • More subjective quality comparison will be displayed in supplementary materials
Conclusion
  • The authors inherits the advantages of the traditional and learned image compression methods to realize learned block-based hybrid image compression network (LBHIC).
  • The framework includes block partition, contextual prediction module (CPM), and boundary-aware post-processing module (BPM).
  • Block partition brings the possibility of acceleration, CPM effectively utilizes the correlation between blocks to improve coding efficiency, and BPM takes into account block effect to improve subjective and objective quality.
  • Experimental results show that the approach achieves SOTA performance and provides almost 10x improvement in decoded runtime
Summary
  • Introduction:

    Image compression is an essential technique to reduce the overhead of image transmission and storage.
  • Learned image compression methods are commonly based on the transformation coding frameworks consisting of transformation, quantization, and entropy modeling.
  • For learned image compression methods, probabilistic-based entropy modeling is an essential part, which includes hyperprior modeling (Balleet al.
  • The autoregressive context module estimates the probability based on the previously decoded elements in the latent space, which can achieve a bit-rate saving of 15.87%.
  • The context modeling brings important benefits in compression, it greatly increases the time complexity, especially in the decoding part.
  • The entire decoding step must be processed in sequence
  • Objectives:

    This paper aims to solve the above limitations and propose a effective and efficient solution for learned image compression.
  • Methods:

    3.1 Overview of the Framework

    The overall framework is shown in Figure 2.
  • 3.1 Overview of the Framework.
  • The overall framework is shown in Figure 2.
  • The authors utilize predictive coding to reduce the redundancy between adjacent decoded blocks.
  • Xi,j denotes the block in i-th row and j-th column, and Xi,j denotes the corresponding decoded block, the authors can calculate the residual information Ri,j as: Ri,j = Xi,j − gpred(Xi−1,j , Xi,j−1; θp)
  • Results:

    4.1 Experimental Setup

    Datasets: The authors use Imagenet (Deng et al 2009) as the training set for the first three stages of model training, while DIV2K dataset (Agustsson and Timofte 2017) and CLIC 2020 challenge training set are utilized for the fourth training stage.
  • As shown in Figure 6 and Figure 7, the authors visualize RD curves of the LBHIC model and the SOTA methods for image compression in terms of PSNR and MS-SSIM metrics on the Kodak dataset, respectively.
  • Compared with latest traditional codec VTM 8.0, the LBHIC framework can achieve about 0.3dB gain in PSNR at all rate points.
  • To further verify the generalization performance of the model, the authors tested the model on the Tecnick dataset, and results are shown in supplementary materials.
  • More subjective quality comparison will be displayed in supplementary materials
  • Conclusion:

    The authors inherits the advantages of the traditional and learned image compression methods to realize learned block-based hybrid image compression network (LBHIC).
  • The framework includes block partition, contextual prediction module (CPM), and boundary-aware post-processing module (BPM).
  • Block partition brings the possibility of acceleration, CPM effectively utilizes the correlation between blocks to improve coding efficiency, and BPM takes into account block effect to improve subjective and objective quality.
  • Experimental results show that the approach achieves SOTA performance and provides almost 10x improvement in decoded runtime
Tables
  • Table1: Average encoding and decoding runtimes and BDrate results compared with recent works. We utilize VTM 8.0 (intra) as the anchor to calculate the BD-rate
  • Table2: Performance of BPM compared with the SOTA post-processing methods on Kodak dataset
Download tables as Excel
Related work
  • Lossy image compression standard codecs have been developed for decades. As one of the most widely used standards, JPEG is firstly created in 1992. Subsequently, to improve compression performance, standards such as JPEG2000, WebP, BPG, and VVC (intra) are successively proposed. To our knowledge, VVC (intra) has the highest compression performance among standard codecs.

    Recently, learned image compression has attracted great attention to achieve better compression performance in this field. From the perspective of neural topology, learned image compression methods can be divided into two categories: recurrent models and variational autoencoder (VAE) models.
Funding
  • It is worth noting that even compared with the latest traditional standard VVC (VTM 8.0), our solution can save 4.1% of the BD-rate
Reference
  • Agustsson, E.; Mentzer, F.; Tschannen, M.; Cavigelli, L.; Timofte, R.; Benini, L.; and Gool, L. V. 2017. Soft-tohard vector quantization for end-to-end learning compressible representations. In Advances in Neural Information Processing Systems, 1141–1151.
    Google ScholarLocate open access versionFindings
  • Agustsson, E.; and Theis, L. 2020. Universally Quantized Neural Compression. arXiv preprint arXiv:2006.09952.
    Findings
  • Agustsson, E.; and Timofte, R. 2017. Ntire 2017 challenge on single image super-resolution: Dataset and study. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 126–135.
    Google ScholarLocate open access versionFindings
  • Asuni, N.; and Giachetti, A. 201TESTIMAGES: a Largescale Archive for Testing Visual Devices and Basic Image Processing Algorithms. In STAG, 63–70.
    Google ScholarLocate open access versionFindings
  • Balle, J.; Laparra, V.; and Simoncelli, E. P. 2016. End-toend optimization of nonlinear transform codes for perceptual quality. In Picture Coding Symposium (PCS), 2016, 1–IEEE.
    Google ScholarLocate open access versionFindings
  • Balle, J.; Laparra, V.; and Simoncelli, E. P. 2017. End-to-end Optimized Image Compression. In International Conference on Learning Representations (ICLR).
    Google ScholarLocate open access versionFindings
  • Balle, J.; Minnen, D.; Singh, S.; Hwang, S. J.; and Johnston, N. 2018. Variational image compression with a scale hyperprior. In International Conference on Learning Representations (ICLR).
    Google ScholarLocate open access versionFindings
  • Bellard, F. 2015. BPG Image format. URL https://bellard.org/bpg 1.
    Findings
  • Bjontegaard, G. 2001. Calculation of average PSNR differences between RD-curves. VCEG-M33.
    Google ScholarFindings
  • Bross, B.; Chen, J.; and Liu, S. 2018. Versatile video coding (Draft 5). JVET-K1001.
    Google ScholarFindings
  • Chen, T.; Liu, H.; Ma, Z.; Shen, Q.; Cao, X.; and Wang, Y. 2019. Neural image compression via non-local attention optimization and improved context modeling. arXiv preprint arXiv:1910.06244.
    Findings
  • Cheng, Z.; Sun, H.; Takeuchi, M.; and Katto, J. 2020. Learned image compression with discretized gaussian mixture likelihoods and attention modules. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7939–7948.
    Google ScholarLocate open access versionFindings
  • Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; and FeiFei, L. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, 248–255. Ieee.
    Google ScholarLocate open access versionFindings
  • Franzen, R. 1999. Kodak lossless true color image suite. source: http://r0k.us/graphics/kodak 4(2).
    Findings
  • Guo, Z.; Wu, Y.; Feng, R.; Zhang, Z.; and Chen, Z. 2020. 3-D Context Entropy Model for Improved Practical Image Compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 116–117.
    Google ScholarLocate open access versionFindings
  • Hou, Q.; Zhang, L.; Cheng, M.-M.; and Feng, J. 2020. Strip Pooling: Rethinking Spatial Pooling for Scene Parsing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4003–4012.
    Google ScholarLocate open access versionFindings
  • Hu, Y.; Yang, W.; and Liu, J. 2020. Coarse-to-Fine HyperPrior Modeling for Learned Image Compression. In AAAI, 11013–11020.
    Google ScholarLocate open access versionFindings
  • Johnston, N.; Eban, E.; Gordon, A.; and Balle, J. 2019. Computationally efficient neural image compression. arXiv preprint arXiv:1912.08771.
    Findings
  • Johnston, N.; Vincent, D.; Minnen, D.; Covell, M.; Singh, S.; Chinen, T.; Jin Hwang, S.; Shor, J.; and Toderici, G. 2018. Improved lossy image compression with priming and spatially adaptive bit rates for recurrent networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4385–4393.
    Google ScholarLocate open access versionFindings
  • Kim, D.-W.; Ryun Chung, J.; and Jung, S.-W. 2019. Grdn: Grouped residual dense network for real image denoising and gan-based real-world noise modeling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 0–0.
    Google ScholarLocate open access versionFindings
  • Kingma, D. P.; and Ba, J. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
    Findings
  • Lee, J.; Cho, S.; and Beack, S.-K. 2019. Context-adaptive Entropy Model for End-to-end Optimized Image Compression. In the 7th Int. Conf. on Learning Representations.
    Google ScholarFindings
  • Lee, J.; Cho, S.; and Kim, M. 2019. A hybrid architecture of jointly learning image compression and quality enhancement with improved entropy minimization. arXiv preprint arXiv:1912.12817.
    Findings
  • Li, X.; Sun, S.; Zhang, Z.; and Chen, Z. 2020. Multi-scale Grouped Dense Network for VVC Intra Coding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 158–159.
    Google ScholarLocate open access versionFindings
  • Lin, C.; Yao, J.; Chen, F.; and Wang, L. 2020. A Spatial RNN Codec for End-to-End Image Compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13269–13277.
    Google ScholarLocate open access versionFindings
  • Lu, M.; Chen, T.; Liu, H.; and Ma, Z. 2019. Learned Image Restoration for VVC Intra Coding. In CVPR Workshops, 0.
    Google ScholarFindings
  • Mentzer, F.; Agustsson, E.; Tschannen, M.; Timofte, R.; and Van Gool, L. 2018. Conditional probability models for deep image compression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4394–4402.
    Google ScholarLocate open access versionFindings
  • Minnen, D.; Balle, J.; and Toderici, G. D. 2018. Joint autoregressive and hierarchical priors for learned image compression. In Advances in Neural Information Processing Systems, 10794–10803.
    Google ScholarLocate open access versionFindings
  • Park, B.; Yu, S.; and Jeong, J. 2019. Densely connected hierarchical network for image denoising. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 0–0.
    Google ScholarLocate open access versionFindings
  • Richardson, I. E. 2004. H. 264 and MPEG-4 video compression: video coding for next-generation multimedia. John Wiley & Sons.
    Google ScholarFindings
  • Ronneberger, O.; Fischer, P.; and Brox, T. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, 234–241. Springer.
    Google ScholarLocate open access versionFindings
  • Toderici, G.; O’Malley, S. M.; Hwang, S. J.; Vincent, D.; Minnen, D.; Baluja, S.; Covell, M.; and Sukthankar, R. 2016. Variable rate image compression with recurrent neural networks. In International Conference on Learning Representations (ICLR).
    Google ScholarLocate open access versionFindings
  • Toderici, G.; Vincent, D.; Johnston, N.; Jin Hwang, S.; Minnen, D.; Shor, J.; and Covell, M. 2017. Full resolution image compression with recurrent neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5306–5314.
    Google ScholarLocate open access versionFindings
  • Wang, X.; Girshick, R.; Gupta, A.; and He, K. 2018. Nonlocal neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, 7794– 7803.
    Google ScholarLocate open access versionFindings
  • Wang, Z.; Simoncelli, E. P.; and Bovik, A. C. 2003. Multiscale structural similarity for image quality assessment. In The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, volume 2, 1398–1402. Ieee.
    Google ScholarFindings
  • Zhang, Y.; Tian, Y.; Kong, Y.; Zhong, B.; and Fu, Y. 2018. Residual dense network for image super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2472–2481.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments