Deep Neural Networks for No-Reference and Full-Reference Image Quality Assessment

IEEE Transactions on Image Processing, pp. 206-219, 2018.

Cited by: 265|Bibtex|Views71|DOI:https://doi.org/10.1109/TIP.2017.2760518
EI WOS
Other Links: dblp.uni-trier.de|pubmed.ncbi.nlm.nih.gov|academic.microsoft.com|arxiv.org
Weibo:
The experimental results show that the proposed methods outperforms other state-of-the-art approaches for NR as well as for FR image quality assessment and achieve generalization capabilities competitive to state-of-the-art data-driven approaches

Abstract:

We present a deep neural network-based approach to image quality assessment (IQA). The network is trained end-to-end and comprises ten convolutional layers and five pooling layers for feature extraction, and two fully connected layers for regression, which makes it significantly deeper than related IQA models. Unique features of the propo...More

Code:

Data:

0
Introduction
  • D IGITAL video is ubiquitous today in almost every aspect of life, and applications such as high definition television, video chat, or internet video streaming are used for

    Manuscript received January 13, 2017; revised July 6, 2017 and September 14, 2017; accepted September 27, 2017.
  • The associate editor coordinating the review of this manuscript and approving it for publication was Dr Kalpana Seshadrinathan.
  • The associate editor coordinating the review of this manuscript and approving it for publication was Dr Kalpana Seshadrinathan. (Sebastian Bosse and Dominique Maniry contributed to this work.) (Corresponding author: Sebastian Bosse.)
Highlights
  • D IGITAL video is ubiquitous today in almost every aspect of life, and applications such as high definition television, video chat, or internet video streaming are used for

    Manuscript received January 13, 2017; revised July 6, 2017 and September 14, 2017; accepted September 27, 2017
  • This paper studies the use of a deep convolutional neural networks (CNNs) with an architecture [3] largely inspired by the organization of the primates’ visual cortex, comprising 10 convolutional layers and 5 pooling layers for feature extraction, and 2 fully connected layers for regression, in a general image quality assessment (IQA) setting and shows that network depth has a significant impact on performance
  • 2) No-Reference Image Quality Assessment: For evaluating the generalization ability of the proposed NR IQA models, we extend cross-database experiments presented in [26] with our results
  • While DIQaM-NR shows superior performance compared to BRISQUE and CORNIA on the CISQ subset, the proposed models are outperformed by the other state-of-the-art methods when cross-evaluated on the subset of TID2013
  • This paper presented a neural network-based approach to FR and NR IQA that allows for feature learning and regression in an end-to-end framework
  • The experimental results show that the proposed methods outperforms other state-of-the-art approaches for NR as well as for FR IQA and achieve generalization capabilities competitive to state-of-the-art data-driven approaches
Methods
  • The LIVE [42] database comprises 779 quality annotated images based on 29 source reference images that are subject to 5 different types of distortions at different distortion levels.
  • The TID2013 image quality database [43] is an extension of the earlier published TID2008 image quality database [45] containing 3000 quality annotated images based on 25 source reference images distorted by 24 different distortion types at
Results
  • The performances of the three feature fusion schemes are reported for LIVE and TID2013 in Table VIII
  • Mere concatenation of both feature vectors does not fail but consistently performs worse than both of the fusion schemes exploiting the explicit difference of both feature vectors.
  • This suggests that while the model is able to learn the relation between the two feature vectors, providing that relation explicitly helps to improve the performance.
  • Note that the feature fusion scheme might affect the learned features as well, other things equal, it is not guaranteed the extracted features fr and fd are equal for different fusion methods
Conclusion
  • DISCUSSION & CONCLUSION

    This paper presented a neural network-based approach to FR and NR IQA that allows for feature learning and regression in an end-to-end framework.
  • Until following the BIECON [28] framework, networks could be pre-trained unsupervised by optimizing on reproducing the quality prediction of a FR IQM, and a pre-trained network employing patch-wise weighting could be refined by end-to-end training.
  • This combines the advantages of the two approaches
Summary
  • Introduction:

    D IGITAL video is ubiquitous today in almost every aspect of life, and applications such as high definition television, video chat, or internet video streaming are used for

    Manuscript received January 13, 2017; revised July 6, 2017 and September 14, 2017; accepted September 27, 2017.
  • The associate editor coordinating the review of this manuscript and approving it for publication was Dr Kalpana Seshadrinathan.
  • The associate editor coordinating the review of this manuscript and approving it for publication was Dr Kalpana Seshadrinathan. (Sebastian Bosse and Dominique Maniry contributed to this work.) (Corresponding author: Sebastian Bosse.)
  • Methods:

    The LIVE [42] database comprises 779 quality annotated images based on 29 source reference images that are subject to 5 different types of distortions at different distortion levels.
  • The TID2013 image quality database [43] is an extension of the earlier published TID2008 image quality database [45] containing 3000 quality annotated images based on 25 source reference images distorted by 24 different distortion types at
  • Results:

    The performances of the three feature fusion schemes are reported for LIVE and TID2013 in Table VIII
  • Mere concatenation of both feature vectors does not fail but consistently performs worse than both of the fusion schemes exploiting the explicit difference of both feature vectors.
  • This suggests that while the model is able to learn the relation between the two feature vectors, providing that relation explicitly helps to improve the performance.
  • Note that the feature fusion scheme might affect the learned features as well, other things equal, it is not guaranteed the extracted features fr and fd are equal for different fusion methods
  • Conclusion:

    DISCUSSION & CONCLUSION

    This paper presented a neural network-based approach to FR and NR IQA that allows for feature learning and regression in an end-to-end framework.
  • Until following the BIECON [28] framework, networks could be pre-trained unsupervised by optimizing on reproducing the quality prediction of a FR IQM, and a pre-trained network employing patch-wise weighting could be refined by end-to-end training.
  • This combines the advantages of the two approaches
Tables
  • Table1: PERFORMANCE COMPARISON ON LIVE AND TID2013 DATABASES
  • Table2: SROCC COMPARISON IN CROSS-DATABASE EVALUATION. ALL MODELS ARE TRAINED ON FULL LIVE OR TID2013, RESPECTIVELY, AND TESTED ON EITHER CSIQ, LIVE OR TID2013
  • Table3: PERFORMANCE COMPARISON FOR DIFFERENT SUBSETS OF TID2013
  • Table4: PLCC ON SELECTED DISTORTION OF TID2013
  • Table5: SROCC IN CROSS-DATABASE EVALUATION FROM. ALL MODELS ARE
  • Table6: PERFORMANCE EVALUATION FOR NR IQA ON CLIVE
  • Table7: SROCC COMPARISON IN CROSS-DATABASE EVALUATION. ALL MODELS ARE TRAINED ON THE FULL TID2013 DATABASE AND EVALUATED ON CSIQ
Download tables as Excel
Related work
  • A. Full-Reference Image Quality Assessment

    The most simple and straight-forward image quality metric is the mean square error (MSE) between reference and distorted image. Although being widely used, it does not correlate well with perceived visual quality [6]. This led to the development of a whole zoo of image quality metrics that strive for a better agreement with the image quality as perceived by humans [7].

    Most popular quality metrics belong to the class of top-down approaches and try to identify and exploit distortion-related changes in image features in order to estimate perceived quality. These kinds of approaches can be found in the FR, RR, and NR domain. The SSIM [8] is probably the most prominent example of these approaches. It considers the sensitivity of the HVS to structural information by pooling luminance similarity (comparing local mean luminance), contrast similarity (comparing local variances) and structural similarity (measured as local covariance). The SSIM was not only extended for multiple scales to the MS-SSIM [9], but the framework of pooling complementary features similarity maps served as inspiration for other FR IQMs employing different features, such as the FSIM [10], the GMSD [11], the SR-SIM [12] or HaarPSI [13]. DeepSim [14] extracts feature maps for similarity computation from the different layers of a deep CNN pre-trained for recognition, showing that features learned for image recognition are also meaningful in the context of perceived quality. The difference of Gaussian (DOG)-SSIM [15] belongs somewhat to the top-down as well as to the bottom-up domain, as it mimics the frequency bands of the contrast sensitivity function using a DOG-based channel decomposition. Channels are then input to SSIM in order to calculate channel-wise quality values that are pooled by a trained regression model to an overall quality estimate. The MAD [16] distinguishes between supra- and near-threshold distortions to account for different domains of human quality perception.
Funding
  • This work was supported in part by the German Ministry for Education and Research as Berlin Big Data Center under Grant 01IS14013A, in part by the Institute for Information and Communications Technology Promotion through the Korea Government under Grant 2017-0-00451, and in part by DFG
  • Müller was supported by the National Research Foundation of Korea through the Ministry of Education, Science, and Technology in the BK21 Program
Reference
  • Z. Wang, A. C. Bovik, and L. Lu, “Why is image quality assessment so difficult?” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), vol.
    Google ScholarLocate open access versionFindings
  • 4. May 2002, pp. IV-3313–IV-3316.
    Google ScholarFindings
  • [2] A. C. Bovik, “Automatic prediction of perceptual image and video quality,” Proc. IEEE, vol. 101, no. 9, pp. 2008–2024, Sep. 2013.
    Google ScholarLocate open access versionFindings
  • [3] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. ImageNet Challenge, 2014, pp. 1–10.
    Google ScholarLocate open access versionFindings
  • [4] J. Bromley et al., “Signature verification using a ‘Siamese’ time delay neural network,” Int. J. Pattern Recognit. Artif. Intell., vol. 7, no. 4, pp. 669–688, 1993.
    Google ScholarLocate open access versionFindings
  • [5] S. Chopra, R. Hadsell, and Y. LeCun, “Learning a similarity metric discriminatively, with application to face verification,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2005, pp. 349–356.
    Google ScholarLocate open access versionFindings
  • [6] B. Girod, “What’s wrong with mean-squared error?” in Digital Images and Human Vision, A. B. Watson, Ed. Cambridge, MA, USA: MIT Press, 1993, pp. 207–220.
    Google ScholarFindings
  • [7] W. Lin and C.-C. Jay Kuo, “Perceptual visual quality metrics: A survey,” J. Vis. Commun. Image Represent., vol. 22, no. 4, pp. 297–312, 2011.
    Google ScholarLocate open access versionFindings
  • [8] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.
    Google ScholarLocate open access versionFindings
  • [9] Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural similarity for image quality assessment,” in Proc. 37th Asilomar Conf. Signals, Syst. Comput., vol.
    Google ScholarLocate open access versionFindings
  • 2. Nov. 2003, pp. 1398–1402.
    Google ScholarFindings
  • [10] L. Zhang, L. Zhang, X. Mou, and D. Zhang, “FSIM: A feature similarity index for image quality assessment,” IEEE Trans. Image Process., vol. 20, no. 8, pp. 2378–2386, Aug. 2011.
    Google ScholarLocate open access versionFindings
  • [11] W. Xue, L. Zhang, X. Mou, and A. C. Bovik, “Gradient magnitude similarity deviation: A highly efficient perceptual image quality index,” IEEE Trans. Image Process., vol. 23, no. 2, pp. 668–695, Feb. 2014.
    Google ScholarLocate open access versionFindings
  • [12] L. Zhang and H. Li, “SR-SIM: A fast and high performance IQA index based on spectral residual,” in Proc. 19th IEEE Int. Conf. Image Process., Sep. 2012, pp. 1473–1476.
    Google ScholarLocate open access versionFindings
  • [13] R. Reisenhofer, S. Bosse, G. Kutyniok, and T. Wiegand, “A Haar wavelet-based perceptual similarity index for image quality assessment,” arXiv preprint arXiv:1607.06140, 2016.
    Findings
  • [14] F. Gao, Y. Wang, P. Li, M. Tan, J. Yu, and Y. Zhu, “DeepSim: Deep similarity for image quality assessment,” Neurocomputing, vol. 257, pp. 104–114, Sep. 2017.
    Google ScholarLocate open access versionFindings
  • [15] S.-C. Pei and L.-H. Chen, “Image quality assessment using human visual DOG model fused with random forest,” IEEE Trans. Image Process., vol. 24, no. 11, pp. 3282–3292, Nov. 2015.
    Google ScholarLocate open access versionFindings
  • [16] E. C. Larson and D. M. Chandler, “Most apparent distortion: Fullreference image quality assessment and the role of strategy,” J. Electron. Imag., vol. 19, no. 1, pp. 011006-1–011006-21, 2010.
    Google ScholarLocate open access versionFindings
  • [17] V. V. Lukin, N. N. Ponomarenko, O. I. Ieremeiev, K. O. Egiazarian, and J. Astola, “Combining full-reference image visual quality metrics by neural network,” Proc. SPIE, vol. 9394, pp. 93940K, Mar. 2015.
    Google ScholarLocate open access versionFindings
  • [18] A. K. Moorthy and A. C. Bovik, “Blind image quality assessment: From natural scene statistics to perceptual quality,” IEEE Trans. Image Process., vol. 20, no. 12, pp. 3350–3364, Dec. 2011.
    Google ScholarLocate open access versionFindings
  • [19] M. A. Saad, A. C. Bovik, and C. Charrier, “Blind image quality assessment: A natural scene statistics approach in the DCT domain,” IEEE Trans. Image Process., vol. 21, no. 8, pp. 3339–3352, Aug. 2012.
    Google ScholarLocate open access versionFindings
  • [20] A. Mittal, A. K. Moorthy, and A. C. Bovik, “No-reference image quality assessment in the spatial domain,” IEEE Trans. Image Process., vol. 21, no. 12, pp. 4695–4708, Dec. 2012.
    Google ScholarLocate open access versionFindings
  • [21] A. Mittal, R. Soundararajan, and A. C. Bovik, “Making a ‘completely blind’ image quality analyzer,” IEEE Signal Process. Lett., vol. 20, no. 3, pp. 209–212, Mar. 2013.
    Google ScholarLocate open access versionFindings
  • [22] D. Ghadiyaram and A. C. Bovik. (2015). LIVE in the Wild Image Quality Challenge Database. [Online]. Available: http://live.ece.utexas.edu/research/ChallengeDB/index.html
    Findings
  • [23] D. Ghadiyaram and A. C. Bovik, “Massive online crowdsourced study of subjective and objective picture quality,” IEEE Trans. Image Process., vol. 25, no. 1, pp. 372–387, Jan. 2016.
    Google ScholarLocate open access versionFindings
  • [24] P. Ye, J. Kumar, L. Kang, and D. Doermann, “Unsupervised feature learning framework for no-reference image quality assessment,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2012, pp. 1098–1105.
    Google ScholarLocate open access versionFindings
  • [25] P. Zhang, W. Zhou, L. Wu, and H. Li, “SOM: Semantic obviousness metric for image quality assessment,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, pp. 2394–2402.
    Google ScholarLocate open access versionFindings
  • [26] L. Zhang, Z. Gu, X. Liu, H. Li, and J. Lu, “Training quality-aware filters for no-reference image quality assessment,” IEEE MultiMedia, vol. 21, no. 4, pp. 67–75, Oct./Dec. 2014.
    Google ScholarLocate open access versionFindings
  • [27] L. Kang, P. Ye, Y. Li, and D. Doermann, “Convolutional neural networks for no-reference image quality assessment,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2014, pp. 1733–1740.
    Google ScholarLocate open access versionFindings
  • [28] J. Kim and S. Lee, “Fully deep blind image quality predictor,” IEEE J. Sel. Topics Signal Process., vol. 11, no. 1, pp. 206–220, Feb. 2017.
    Google ScholarLocate open access versionFindings
  • [29] S. Bosse, D. Maniry, T. Wiegand, and W. Samek, “A deep neural network for image quality assessment,” in Proc. IEEE Int. Conf. Image Process. (ICIP), Sep. 2016, pp. 3773–3777.
    Google ScholarLocate open access versionFindings
  • [30] S. Bosse, D. Maniry, K.-R. Müller, T. Wiegand, and W. Samek, “Neural network-based full-reference image quality assessment,” in Proc. Picture Coding Symp. (PCS), 2016, pp. 1–5.
    Google ScholarLocate open access versionFindings
  • [31] W. Zhang, A. Borji, Z. Wang, P. Le Callet, and H. Liu, “The application of visual saliency models in objective image quality assessment: A statistical evaluation,” IEEE Trans. Neural Netw. Learn. Syst., vol. 27, no. 6, pp. 1266–1278, Jun. 2016.
    Google ScholarLocate open access versionFindings
  • [32] L. Zhang, Y. Shen, and H. Li, “VSI: A visual saliency-induced index for perceptual image quality assessment,” IEEE Trans. Image Process., vol. 23, no. 10, pp. 4270–4281, Aug. 2014.
    Google ScholarLocate open access versionFindings
  • [33] O. Russakovsky et al., “ImageNet large scale visual recognition challenge,” Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, Dec. 2015.
    Google ScholarLocate open access versionFindings
  • [34] R. Girshick, “Fast R-CNN,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Dec. 2015, pp. 1440–1448.
    Google ScholarLocate open access versionFindings
  • [35] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, pp. 3431–3440.
    Google ScholarLocate open access versionFindings
  • [36] V. Nair and G. E. Hinton, “Rectified linear units improve restricted Boltzmann machines,” in Proc. 27th Int. Conf. Mach. Learn., vol. 3. 2010, pp. 807–814.
    Google ScholarLocate open access versionFindings
  • [37] N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” J. Mach. Learn. Res., vol. 15, pp. 1929–1958, Jun. 2014.
    Google ScholarLocate open access versionFindings
  • [38] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, Nov. 1998.
    Google ScholarLocate open access versionFindings
  • [39] Y. A. LeCun, L. Bottou, G. B. Orr, and K.-R. Müller, “Efficient BackProp,” in Neural Networks: Tricks of the Trade (Lecture Notes in Computer Science), vol. 7700. Springer, 2012, pp. 9–48.
    Google ScholarLocate open access versionFindings
  • [40] D. P. Kingma and J. Ba. (2014). “Adam: A method for stochastic optimization.” [Online]. Available: https://arxiv.org/abs/1412.6980
    Findings
  • [41] L. Prechelt, “Early stopping—But when?” in Neural Networks: Tricks of the Trade. Springer, 2012, pp. 53–67.
    Google ScholarFindings
  • [42] H. R. Sheikh, M. F. Sabir, and A. C. Bovik, “A statistical evaluation of recent full reference image quality assessment algorithms,” IEEE Trans. Image Process., vol. 15, no. 11, pp. 3440–3451, Nov. 2006.
    Google ScholarLocate open access versionFindings
  • [43] N. Ponomarenko et al., “Color image database TID2013: Peculiarities and preliminary results,” in Proc. 4th Eur. Workshop Vis. Inf. Process. (EUVIP), 2013, pp. 106–111.
    Google ScholarLocate open access versionFindings
  • [44] E. C. Larson and D. M. Chandler. (2009). Consumer Subjective Image Quality Database. [Online]. Available: http://vision.okstate.edu/index.php
    Findings
  • [45] N. Ponomarenko et al., “TID2008—A database for evaluation of fullreference visual quality assessment metrics,” Adv. Mod. Radioelectron., vol. 10, no. 4, pp. 30–45, 2009.
    Google ScholarLocate open access versionFindings
  • [46] R. Soundararajan and A. C. Bovik, “RRED indices: Reduced reference entropic differencing for image quality assessment,” IEEE Trans. Image Process., vol. 21, no. 2, pp. 517–526, Feb. 2012.
    Google ScholarLocate open access versionFindings
  • [47] S. Bosse et al., “Assessing perceived image quality using steadystate visual evoked potentials and spatio-spectral decomposition,” IEEE Trans. Circuits Syst. Video Technol., to be published, doi: 10.1109/ TCSVT.2017.2694807.
    Google ScholarLocate open access versionFindings
  • [48] S. Bosse, K.-R. Müller, T. Wiegand, and W. Samek, “Brain-computer interfacing for multimedia quality assessment,” in Proc. IEEE Int. Conf. Syst., Man, Cybern. (SMC), Oct. 2016, pp. 003742–003747.
    Google ScholarLocate open access versionFindings
  • [49] U. Engelke et al., “Psychophysiology-based QoE assessment: A survey,” IEEE J. Sel. Topics Signal Process., vol. 11, no. 1, pp. 6–21, Feb. 2017.
    Google ScholarLocate open access versionFindings
  • [50] S. Bosse, M. Siekmann, T. Wiegand, and W. Samek, “A perceptually relevant shearlet-based adaptation of the PSNR,” in Proc. IEEE Int. Conf. Image Process. (ICIP), Sep. 2017, pp. 315–319.
    Google ScholarLocate open access versionFindings
  • [51] S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. Müller, and W. Samek, “On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation,” PLoS ONE, vol. 10, no. 7, p. e0130140, 2015.
    Google ScholarLocate open access versionFindings
  • [52] G. Montavon, W. Samek, and K.-R. Müller. (2017). “Methods for interpreting and understanding deep neural networks.” [Online]. Available: https://arxiv.org/abs/1706.07979 Wojciech Samek (M’13) received the Diploma degree in computer science from the Humboldt University of Berlin, Germany, in 2010, and the Ph.D. degree in machine learning from the Technische Universität Berlin in 2014. In 2014, he founded the Machine Learning Group, Fraunhofer Heinrich Hertz Institute, Berlin, Germany, which he currently directs. He is associated with the Berlin Big Data Center. He was a Scholar of the German National Academic Foundation and a Ph.D. Fellow with the Bernstein Center for Computational Neuroscience.
    Findings
Full Text
Your rating :
0

 

Tags
Comments