AI helps you reading Science
AI generates interpretation videos
AI extracts and analyses the key points of the paper to generate videos automatically
AI parses the academic lineage of this thesis
AI extracts a summary of this paper
We propose a general framework for variable-rate image compression and a novel architecture based on convolutional and deconvolutional Long short-term memory recurrent networks
Variable Rate Image Compression with Recurrent Neural Networks.
international conference on learning representations, (2016)
Abstract: A large fraction of Internet traffic is now driven by requests from mobile devices with relatively small screens and often stringent bandwidth requirements. Due to these factors, it has become the norm for modern graphics-heavy websites to transmit low-resolution, low-bytecount image previews (thumbnails) as part of the initial ...More
PPT (Upload PPT)
- The task of image compression has been thoroughly examined over the years by researchers and teams such as the Joint Pictures Experts Group, who designed the ubiquitous JPEG and JPEG 2000 (ISO/IEC 15444-1) image formats.
- The higher-resolution an image is, the more likely it is that its component patches will contain mostly low-frequency information
- This fact is exploited by most image codecs and, as such, these codecs tend to be very efficient at compressing high-resolution images.
- Such assumptions are broken when creating thumbnails from high-resolution natural images, as a patch taken from a thumbnail is much more likely to contain difficult-to-compress, high-frequency information
- The task of image compression has been thoroughly examined over the years by researchers and teams such as the Joint Pictures Experts Group, who designed the ubiquitous JPEG and JPEG 2000 (ISO/IEC 15444-1) image formats
- The WebP algorithm was proposed in order to further improve image compression rates (Google, 2015), especially for the high-resolution images that have become more common in recent years
- The metric commonly used in this context is the peak signal-to-noise ratio (PSNR), PSNR is biased toward algorithms which have been tuned to minimize L2 loss
- We describe various methods for variable-length encoding of image patches using neural networks, and demonstrate that for the given benchmark, the fully-connected Long short-term memory (LSTM) model can perform on par with JPEG, while the convolutional/deconvolutional LSTM model is able to significantly outperform JPEG on the Structural Similarity Index (SSIM) perceptual metric
- An obvious need is to extend the current work to function on arbitrarily large images, taking advantage of spatial redundancy in images in a manner similar to entropy coding
- The algorithms that we present may be extended to work on video, which we believe to be the grand challenge for neural network-based compression
- EXPERIMENTS & ANALYSIS
In order to train the various neural network configurations, the authors used the Adam algorithm proposed by Kingma & Ba (2014).
- The authors experimented with the number of steps needed to encode each patch, varying this from 8 to 16.
- For the fully connected networks, the authors chose to use 8 bits per step for an 8×8 patch, allowing them to fine tune the compression rate in increments of 8 bits.
- The authors experimented with a binary output of 2 bits per pixel at this resolution, yielding a tunable compression rate with increments of 16 bytes per 32×32 block
- EVALUATION PROTOCOL AND METRICS
Evaluating image compression algorithms is a non-trivial task.
- The metric commonly used in this context is the peak signal-to-noise ratio (PSNR), PSNR is biased toward algorithms which have been tuned to minimize L2 loss.
- This would not be a fair comparison against methods like JPEG which have been tuned to minimize a form of perceptual loss.
- The final score is the average SSIM over all patches and channels
- CONCLUSION & FUTURE WORK
The authors describe various methods for variable-length encoding of image patches using neural networks, and demonstrate that for the given benchmark, the fully-connected LSTM model can perform on par with JPEG, while the convolutional/deconvolutional LSTM model is able to significantly outperform JPEG on the SSIM perceptual metric.
While the current approach gives favorable results versus modern codecs on small images, codecs that include an entropy coder element tend to improve with greater resolution, meaning that by choosing an arbitrarily large test image it is always possible to defeat an approach like that described in this work.
- The authors presented a solution for dynamic bit assignment in the convolutional case, it is not a fully satisfactory solution as it has the potential to introduce encoding artifacts at patch boundaries.
- Another topic for future work is determining a dynamic bit assignment algorithm that is compatible with the convolutional methods the authors present, while not creating such artifacts.
- The algorithms that the authors present may be extended to work on video, which the authors believe to be the grand challenge for neural network-based compression
- Table1: Comparison between the proposed methods for a given compression target size (in bytes) on the 32x32 image benchmark
- The basic principles of using feed-forward neural networks for image compression have been known for some time (Jiang, 1999). In this context, networks can assist or even entirely take over many of the processes used as part of a traditional image compression pipeline: to learn more efficient frequency transforms, more effective quantization techniques, improved predictive coding, etc.
More recently, autoencoder architectures (Hinton & Salakhutdinov, 2006) have become viable as a means of implementing end-to-end compression. A typical compressing autoencoder has three parts: (1) an encoder which consumes an input (e.g., a fixed-dimension image or patch) and transforms it into (2) a bottleneck representing the compressed data, which can then be transformed by (3) a decoder into something resembling the original input. These three elements are trained end-to-end, but during deployment the encoder and decoder are normally used independently.
The bottleneck is often simply a flat neural net layer, which allows the compression rate and visual fidelity of the encoded images to be controlled by adjusting the number of nodes in this layer before training. For some types of autoencoder, encoding the bottleneck as a simple bit vector can be beneficial (Krizhevsky & Hinton, 2011). In neural net-based classification tasks, images are repeatedly downsampled through convolution and pooling operations, and the entire output of the network might be contained in just a single node. In the decoder half of an autoencoder, however, the network must proceed in the opposite direction and convert a short bit vector into a much larger image or image patch. When this upsampling process is spatially-aware, resembling a “backward convolution,” it is commonly referred to as deconvolution (Long et al, 2014).
- On a large-scale benchmark of 32$\times$32 thumbnails, our LSTM-based approaches provide better visual quality than (headerless) JPEG, JPEG2000 and WebP, with a storage size that is reduced by 10% or more
- Note that the (de)convolutional LSTM model exhibits perceptual quality levels that are equal to or better than both JPEG and WebP at 4% – 12% lower average bitrate
- At the JPEG quality levels used in Figure 4, disabling subsampling (i.e., using 4:4:4 encoding) leads to a costly increase in JPEG’s bitrate: 1.32-1.77 bpp instead of 1.05-1.406 bpp, or 26% greater
- Courbariaux, M., Bengio, Y., and David, J.-P. BinaryConnect: Training deep neural networks with binary weights during propagations. In NIPS, 2015.
- Denton, E., Chintala, S., Szlam, A., and Fergus, R. Deep generative image models using a laplacian pyramid of adversarial networks. arXiv preprint arXiv:1506.05751, 2015.
- Google. WebP Compression Study. https://developers.google.com/speed/webp/docs/webp_study, 2015. Accessed:2015-11-10.
- Graves, A. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850, 2013.
- Graves, A., Mohamed, A., and Hinton, G. Speech recognition with deep recurrent neural networks. In International Conference on Acoustics, Speech and Signal Processing, 2013.
- Gregor, K., Danihelka, I., Graves, A., Rezende, D. J., and Wierstra, D. Draw: A recurrent neural network for image generation. arXiv preprint arXiv:1502.04623, 2015.
- Hinton, G. E. and Salakhutdinov, R. R. Reducing the dimensionality of data with neural networks. Science, 313(5786):504–507, 2006.
- Hochreiter, S. and Schmidhuber, J. Long short-term memory. Neural Computation, 9(8), 1997. ISO/IEC 15444-1. Information technology–JPEG 2000 image coding system. Standard, International
- Organization for Standardization, Geneva, CH, December 2000.
- Jiang, J. Image compression with neural networks–a survey. Signal Processing: Image Communication, 14:737–760, 1999.
- Kingma, D. P. and Ba, J. Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2014.
- URL http://arxiv.org/abs/1412.6980.
- European Symposium on Artificial Neural Networks, 2011.
- Krizhevsky, Alex. Learning multiple layers of features from tiny images. Technical report, 2009.
- CoRR, abs/1411.4038, 2014. URL http://arxiv.org/abs/1411.4038.
- Ponomarenko, N., Silvestri, F., Egiazarian, K., Carli, M., Astola, J., and Lukin, V. On betweencoefficient contrast masking of DCT basis functions. In Proc. 3rd Int’l. Workshop on Video Processing and Quality Metrics, volume 4, 2007.
- Raiko, T., Berglund, M., Alain, G., and Dinh, L. Techniques for learning binary stochastic feedforward neural networks. ICLR, 2015.
- Shi, X., Chen, Z., Wang, H., Yeung, D.-Y., Wong, W.-K., and Woo, W.-C. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. CoRR, abs/1506.04214, 2015. URL http://arxiv.org/abs/1506.04214.pdf. Sutskever, I., Vinyals, O., and Le, Q. V. Sequence to sequence learning with neural networks. CoRR, abs/1409.3215, 2014. URL http://arxiv.org/abs/1409.3215.
- Wang, Z., Bovik, A., Conrad, A., Sheikh, H. R., and Simoncelli, E. P. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600–612, 2004.
- Williams, Ronald J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229–256, 1992.
- Zaremba, W., Sutskever, I., and Vinyals, O. Recurrent neural network regularization. arXiv preprint arXiv:1409.2329, 2014.