Deep Neural Networks for No-Reference and Full-Reference Image Quality Assessment
IEEE Transactions on Image Processing, pp. 206-219, 2018.
EI WOS
Weibo:
Abstract:
We present a deep neural network-based approach to image quality assessment (IQA). The network is trained end-to-end and comprises ten convolutional layers and five pooling layers for feature extraction, and two fully connected layers for regression, which makes it significantly deeper than related IQA models. Unique features of the propo...More
Code:
Data:
Introduction
- D IGITAL video is ubiquitous today in almost every aspect of life, and applications such as high definition television, video chat, or internet video streaming are used for
Manuscript received January 13, 2017; revised July 6, 2017 and September 14, 2017; accepted September 27, 2017. - The associate editor coordinating the review of this manuscript and approving it for publication was Dr Kalpana Seshadrinathan.
- The associate editor coordinating the review of this manuscript and approving it for publication was Dr Kalpana Seshadrinathan. (Sebastian Bosse and Dominique Maniry contributed to this work.) (Corresponding author: Sebastian Bosse.)
Highlights
- D IGITAL video is ubiquitous today in almost every aspect of life, and applications such as high definition television, video chat, or internet video streaming are used for
Manuscript received January 13, 2017; revised July 6, 2017 and September 14, 2017; accepted September 27, 2017 - This paper studies the use of a deep convolutional neural networks (CNNs) with an architecture [3] largely inspired by the organization of the primates’ visual cortex, comprising 10 convolutional layers and 5 pooling layers for feature extraction, and 2 fully connected layers for regression, in a general image quality assessment (IQA) setting and shows that network depth has a significant impact on performance
- 2) No-Reference Image Quality Assessment: For evaluating the generalization ability of the proposed NR IQA models, we extend cross-database experiments presented in [26] with our results
- While DIQaM-NR shows superior performance compared to BRISQUE and CORNIA on the CISQ subset, the proposed models are outperformed by the other state-of-the-art methods when cross-evaluated on the subset of TID2013
- This paper presented a neural network-based approach to FR and NR IQA that allows for feature learning and regression in an end-to-end framework
- The experimental results show that the proposed methods outperforms other state-of-the-art approaches for NR as well as for FR IQA and achieve generalization capabilities competitive to state-of-the-art data-driven approaches
Methods
- The LIVE [42] database comprises 779 quality annotated images based on 29 source reference images that are subject to 5 different types of distortions at different distortion levels.
- The TID2013 image quality database [43] is an extension of the earlier published TID2008 image quality database [45] containing 3000 quality annotated images based on 25 source reference images distorted by 24 different distortion types at
Results
- The performances of the three feature fusion schemes are reported for LIVE and TID2013 in Table VIII
- Mere concatenation of both feature vectors does not fail but consistently performs worse than both of the fusion schemes exploiting the explicit difference of both feature vectors.
- This suggests that while the model is able to learn the relation between the two feature vectors, providing that relation explicitly helps to improve the performance.
- Note that the feature fusion scheme might affect the learned features as well, other things equal, it is not guaranteed the extracted features fr and fd are equal for different fusion methods
Conclusion
- DISCUSSION & CONCLUSION
This paper presented a neural network-based approach to FR and NR IQA that allows for feature learning and regression in an end-to-end framework. - Until following the BIECON [28] framework, networks could be pre-trained unsupervised by optimizing on reproducing the quality prediction of a FR IQM, and a pre-trained network employing patch-wise weighting could be refined by end-to-end training.
- This combines the advantages of the two approaches
Summary
Introduction:
D IGITAL video is ubiquitous today in almost every aspect of life, and applications such as high definition television, video chat, or internet video streaming are used for
Manuscript received January 13, 2017; revised July 6, 2017 and September 14, 2017; accepted September 27, 2017.- The associate editor coordinating the review of this manuscript and approving it for publication was Dr Kalpana Seshadrinathan.
- The associate editor coordinating the review of this manuscript and approving it for publication was Dr Kalpana Seshadrinathan. (Sebastian Bosse and Dominique Maniry contributed to this work.) (Corresponding author: Sebastian Bosse.)
Methods:
The LIVE [42] database comprises 779 quality annotated images based on 29 source reference images that are subject to 5 different types of distortions at different distortion levels.- The TID2013 image quality database [43] is an extension of the earlier published TID2008 image quality database [45] containing 3000 quality annotated images based on 25 source reference images distorted by 24 different distortion types at
Results:
The performances of the three feature fusion schemes are reported for LIVE and TID2013 in Table VIII- Mere concatenation of both feature vectors does not fail but consistently performs worse than both of the fusion schemes exploiting the explicit difference of both feature vectors.
- This suggests that while the model is able to learn the relation between the two feature vectors, providing that relation explicitly helps to improve the performance.
- Note that the feature fusion scheme might affect the learned features as well, other things equal, it is not guaranteed the extracted features fr and fd are equal for different fusion methods
Conclusion:
DISCUSSION & CONCLUSION
This paper presented a neural network-based approach to FR and NR IQA that allows for feature learning and regression in an end-to-end framework.- Until following the BIECON [28] framework, networks could be pre-trained unsupervised by optimizing on reproducing the quality prediction of a FR IQM, and a pre-trained network employing patch-wise weighting could be refined by end-to-end training.
- This combines the advantages of the two approaches
Tables
- Table1: PERFORMANCE COMPARISON ON LIVE AND TID2013 DATABASES
- Table2: SROCC COMPARISON IN CROSS-DATABASE EVALUATION. ALL MODELS ARE TRAINED ON FULL LIVE OR TID2013, RESPECTIVELY, AND TESTED ON EITHER CSIQ, LIVE OR TID2013
- Table3: PERFORMANCE COMPARISON FOR DIFFERENT SUBSETS OF TID2013
- Table4: PLCC ON SELECTED DISTORTION OF TID2013
- Table5: SROCC IN CROSS-DATABASE EVALUATION FROM. ALL MODELS ARE
- Table6: PERFORMANCE EVALUATION FOR NR IQA ON CLIVE
- Table7: SROCC COMPARISON IN CROSS-DATABASE EVALUATION. ALL MODELS ARE TRAINED ON THE FULL TID2013 DATABASE AND EVALUATED ON CSIQ
Related work
- A. Full-Reference Image Quality Assessment
The most simple and straight-forward image quality metric is the mean square error (MSE) between reference and distorted image. Although being widely used, it does not correlate well with perceived visual quality [6]. This led to the development of a whole zoo of image quality metrics that strive for a better agreement with the image quality as perceived by humans [7].
Most popular quality metrics belong to the class of top-down approaches and try to identify and exploit distortion-related changes in image features in order to estimate perceived quality. These kinds of approaches can be found in the FR, RR, and NR domain. The SSIM [8] is probably the most prominent example of these approaches. It considers the sensitivity of the HVS to structural information by pooling luminance similarity (comparing local mean luminance), contrast similarity (comparing local variances) and structural similarity (measured as local covariance). The SSIM was not only extended for multiple scales to the MS-SSIM [9], but the framework of pooling complementary features similarity maps served as inspiration for other FR IQMs employing different features, such as the FSIM [10], the GMSD [11], the SR-SIM [12] or HaarPSI [13]. DeepSim [14] extracts feature maps for similarity computation from the different layers of a deep CNN pre-trained for recognition, showing that features learned for image recognition are also meaningful in the context of perceived quality. The difference of Gaussian (DOG)-SSIM [15] belongs somewhat to the top-down as well as to the bottom-up domain, as it mimics the frequency bands of the contrast sensitivity function using a DOG-based channel decomposition. Channels are then input to SSIM in order to calculate channel-wise quality values that are pooled by a trained regression model to an overall quality estimate. The MAD [16] distinguishes between supra- and near-threshold distortions to account for different domains of human quality perception.
Funding
- This work was supported in part by the German Ministry for Education and Research as Berlin Big Data Center under Grant 01IS14013A, in part by the Institute for Information and Communications Technology Promotion through the Korea Government under Grant 2017-0-00451, and in part by DFG
- Müller was supported by the National Research Foundation of Korea through the Ministry of Education, Science, and Technology in the BK21 Program
Reference
- Z. Wang, A. C. Bovik, and L. Lu, “Why is image quality assessment so difficult?” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), vol.
- 4. May 2002, pp. IV-3313–IV-3316.
- [2] A. C. Bovik, “Automatic prediction of perceptual image and video quality,” Proc. IEEE, vol. 101, no. 9, pp. 2008–2024, Sep. 2013.
- [3] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. ImageNet Challenge, 2014, pp. 1–10.
- [4] J. Bromley et al., “Signature verification using a ‘Siamese’ time delay neural network,” Int. J. Pattern Recognit. Artif. Intell., vol. 7, no. 4, pp. 669–688, 1993.
- [5] S. Chopra, R. Hadsell, and Y. LeCun, “Learning a similarity metric discriminatively, with application to face verification,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2005, pp. 349–356.
- [6] B. Girod, “What’s wrong with mean-squared error?” in Digital Images and Human Vision, A. B. Watson, Ed. Cambridge, MA, USA: MIT Press, 1993, pp. 207–220.
- [7] W. Lin and C.-C. Jay Kuo, “Perceptual visual quality metrics: A survey,” J. Vis. Commun. Image Represent., vol. 22, no. 4, pp. 297–312, 2011.
- [8] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.
- [9] Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural similarity for image quality assessment,” in Proc. 37th Asilomar Conf. Signals, Syst. Comput., vol.
- 2. Nov. 2003, pp. 1398–1402.
- [10] L. Zhang, L. Zhang, X. Mou, and D. Zhang, “FSIM: A feature similarity index for image quality assessment,” IEEE Trans. Image Process., vol. 20, no. 8, pp. 2378–2386, Aug. 2011.
- [11] W. Xue, L. Zhang, X. Mou, and A. C. Bovik, “Gradient magnitude similarity deviation: A highly efficient perceptual image quality index,” IEEE Trans. Image Process., vol. 23, no. 2, pp. 668–695, Feb. 2014.
- [12] L. Zhang and H. Li, “SR-SIM: A fast and high performance IQA index based on spectral residual,” in Proc. 19th IEEE Int. Conf. Image Process., Sep. 2012, pp. 1473–1476.
- [13] R. Reisenhofer, S. Bosse, G. Kutyniok, and T. Wiegand, “A Haar wavelet-based perceptual similarity index for image quality assessment,” arXiv preprint arXiv:1607.06140, 2016.
- [14] F. Gao, Y. Wang, P. Li, M. Tan, J. Yu, and Y. Zhu, “DeepSim: Deep similarity for image quality assessment,” Neurocomputing, vol. 257, pp. 104–114, Sep. 2017.
- [15] S.-C. Pei and L.-H. Chen, “Image quality assessment using human visual DOG model fused with random forest,” IEEE Trans. Image Process., vol. 24, no. 11, pp. 3282–3292, Nov. 2015.
- [16] E. C. Larson and D. M. Chandler, “Most apparent distortion: Fullreference image quality assessment and the role of strategy,” J. Electron. Imag., vol. 19, no. 1, pp. 011006-1–011006-21, 2010.
- [17] V. V. Lukin, N. N. Ponomarenko, O. I. Ieremeiev, K. O. Egiazarian, and J. Astola, “Combining full-reference image visual quality metrics by neural network,” Proc. SPIE, vol. 9394, pp. 93940K, Mar. 2015.
- [18] A. K. Moorthy and A. C. Bovik, “Blind image quality assessment: From natural scene statistics to perceptual quality,” IEEE Trans. Image Process., vol. 20, no. 12, pp. 3350–3364, Dec. 2011.
- [19] M. A. Saad, A. C. Bovik, and C. Charrier, “Blind image quality assessment: A natural scene statistics approach in the DCT domain,” IEEE Trans. Image Process., vol. 21, no. 8, pp. 3339–3352, Aug. 2012.
- [20] A. Mittal, A. K. Moorthy, and A. C. Bovik, “No-reference image quality assessment in the spatial domain,” IEEE Trans. Image Process., vol. 21, no. 12, pp. 4695–4708, Dec. 2012.
- [21] A. Mittal, R. Soundararajan, and A. C. Bovik, “Making a ‘completely blind’ image quality analyzer,” IEEE Signal Process. Lett., vol. 20, no. 3, pp. 209–212, Mar. 2013.
- [22] D. Ghadiyaram and A. C. Bovik. (2015). LIVE in the Wild Image Quality Challenge Database. [Online]. Available: http://live.ece.utexas.edu/research/ChallengeDB/index.html
- [23] D. Ghadiyaram and A. C. Bovik, “Massive online crowdsourced study of subjective and objective picture quality,” IEEE Trans. Image Process., vol. 25, no. 1, pp. 372–387, Jan. 2016.
- [24] P. Ye, J. Kumar, L. Kang, and D. Doermann, “Unsupervised feature learning framework for no-reference image quality assessment,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2012, pp. 1098–1105.
- [25] P. Zhang, W. Zhou, L. Wu, and H. Li, “SOM: Semantic obviousness metric for image quality assessment,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, pp. 2394–2402.
- [26] L. Zhang, Z. Gu, X. Liu, H. Li, and J. Lu, “Training quality-aware filters for no-reference image quality assessment,” IEEE MultiMedia, vol. 21, no. 4, pp. 67–75, Oct./Dec. 2014.
- [27] L. Kang, P. Ye, Y. Li, and D. Doermann, “Convolutional neural networks for no-reference image quality assessment,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2014, pp. 1733–1740.
- [28] J. Kim and S. Lee, “Fully deep blind image quality predictor,” IEEE J. Sel. Topics Signal Process., vol. 11, no. 1, pp. 206–220, Feb. 2017.
- [29] S. Bosse, D. Maniry, T. Wiegand, and W. Samek, “A deep neural network for image quality assessment,” in Proc. IEEE Int. Conf. Image Process. (ICIP), Sep. 2016, pp. 3773–3777.
- [30] S. Bosse, D. Maniry, K.-R. Müller, T. Wiegand, and W. Samek, “Neural network-based full-reference image quality assessment,” in Proc. Picture Coding Symp. (PCS), 2016, pp. 1–5.
- [31] W. Zhang, A. Borji, Z. Wang, P. Le Callet, and H. Liu, “The application of visual saliency models in objective image quality assessment: A statistical evaluation,” IEEE Trans. Neural Netw. Learn. Syst., vol. 27, no. 6, pp. 1266–1278, Jun. 2016.
- [32] L. Zhang, Y. Shen, and H. Li, “VSI: A visual saliency-induced index for perceptual image quality assessment,” IEEE Trans. Image Process., vol. 23, no. 10, pp. 4270–4281, Aug. 2014.
- [33] O. Russakovsky et al., “ImageNet large scale visual recognition challenge,” Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, Dec. 2015.
- [34] R. Girshick, “Fast R-CNN,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Dec. 2015, pp. 1440–1448.
- [35] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, pp. 3431–3440.
- [36] V. Nair and G. E. Hinton, “Rectified linear units improve restricted Boltzmann machines,” in Proc. 27th Int. Conf. Mach. Learn., vol. 3. 2010, pp. 807–814.
- [37] N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” J. Mach. Learn. Res., vol. 15, pp. 1929–1958, Jun. 2014.
- [38] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, Nov. 1998.
- [39] Y. A. LeCun, L. Bottou, G. B. Orr, and K.-R. Müller, “Efficient BackProp,” in Neural Networks: Tricks of the Trade (Lecture Notes in Computer Science), vol. 7700. Springer, 2012, pp. 9–48.
- [40] D. P. Kingma and J. Ba. (2014). “Adam: A method for stochastic optimization.” [Online]. Available: https://arxiv.org/abs/1412.6980
- [41] L. Prechelt, “Early stopping—But when?” in Neural Networks: Tricks of the Trade. Springer, 2012, pp. 53–67.
- [42] H. R. Sheikh, M. F. Sabir, and A. C. Bovik, “A statistical evaluation of recent full reference image quality assessment algorithms,” IEEE Trans. Image Process., vol. 15, no. 11, pp. 3440–3451, Nov. 2006.
- [43] N. Ponomarenko et al., “Color image database TID2013: Peculiarities and preliminary results,” in Proc. 4th Eur. Workshop Vis. Inf. Process. (EUVIP), 2013, pp. 106–111.
- [44] E. C. Larson and D. M. Chandler. (2009). Consumer Subjective Image Quality Database. [Online]. Available: http://vision.okstate.edu/index.php
- [45] N. Ponomarenko et al., “TID2008—A database for evaluation of fullreference visual quality assessment metrics,” Adv. Mod. Radioelectron., vol. 10, no. 4, pp. 30–45, 2009.
- [46] R. Soundararajan and A. C. Bovik, “RRED indices: Reduced reference entropic differencing for image quality assessment,” IEEE Trans. Image Process., vol. 21, no. 2, pp. 517–526, Feb. 2012.
- [47] S. Bosse et al., “Assessing perceived image quality using steadystate visual evoked potentials and spatio-spectral decomposition,” IEEE Trans. Circuits Syst. Video Technol., to be published, doi: 10.1109/ TCSVT.2017.2694807.
- [48] S. Bosse, K.-R. Müller, T. Wiegand, and W. Samek, “Brain-computer interfacing for multimedia quality assessment,” in Proc. IEEE Int. Conf. Syst., Man, Cybern. (SMC), Oct. 2016, pp. 003742–003747.
- [49] U. Engelke et al., “Psychophysiology-based QoE assessment: A survey,” IEEE J. Sel. Topics Signal Process., vol. 11, no. 1, pp. 6–21, Feb. 2017.
- [50] S. Bosse, M. Siekmann, T. Wiegand, and W. Samek, “A perceptually relevant shearlet-based adaptation of the PSNR,” in Proc. IEEE Int. Conf. Image Process. (ICIP), Sep. 2017, pp. 315–319.
- [51] S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. Müller, and W. Samek, “On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation,” PLoS ONE, vol. 10, no. 7, p. e0130140, 2015.
- [52] G. Montavon, W. Samek, and K.-R. Müller. (2017). “Methods for interpreting and understanding deep neural networks.” [Online]. Available: https://arxiv.org/abs/1706.07979 Wojciech Samek (M’13) received the Diploma degree in computer science from the Humboldt University of Berlin, Germany, in 2010, and the Ph.D. degree in machine learning from the Technische Universität Berlin in 2014. In 2014, he founded the Machine Learning Group, Fraunhofer Heinrich Hertz Institute, Berlin, Germany, which he currently directs. He is associated with the Berlin Big Data Center. He was a Scholar of the German National Academic Foundation and a Ph.D. Fellow with the Bernstein Center for Computational Neuroscience.
Full Text
Tags
Comments