AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We show that Perceptual Information Metric is competitive with supervised metrics on the recent and challenging BAPPS image quality assessment dataset

An Unsupervised Information-Theoretic Perceptual Quality Metric

NIPS 2020, (2020)

Cited by: 3|Views59
EI
Full Text
Bibtex
Weibo

Abstract

Tractable models of human perception have proved to be challenging to build. Hand-designed models such as MS-SSIM remain popular predictors of human image quality judgements due to their simplicity and speed. Recent modern deep learning approaches can perform better, but they rely on supervised data which can be costly to gather: large ...More

Code:

Data:

0
Introduction
  • Many vision tasks require the assessment of subjective image quality for evaluation, including compression and restoration problems such as denoising, deblurring, colorization, etc.
  • The field has been dominated by simple models with few parameters that are hand-adjusted to correlate well with human mean opinion scores (MOS), such as SSIM and variants [32, 31]
  • This class of models captures well-documented phenomena observed in visual psychology, such as spatial frequency dependent contrast sensitivity [29], or are based on models of early sensory neurons, such as divisive normalization, which explains luminance and/or contrast adaptivity [16].
Highlights
  • Many vision tasks require the assessment of subjective image quality for evaluation, including compression and restoration problems such as denoising, deblurring, colorization, etc
  • The field has been dominated by simple models with few parameters that are hand-adjusted to correlate well with human mean opinion scores (MOS), such as SSIM and variants [32, 31]
  • We conjecture that by constructing a metric based on an image representation that efficiently encodes temporally persistent visual information we will be able to make better predictions about human visual perception. We find that this metric is competitive with the fully supervised LPIPS model [36] on the triplet human rating dataset published in the same paper (BAPPS-2AFC), and outperforms the same model on the corresponding just-noticeable-difference dataset (BAPPS-just-noticeable difference (JND))
  • We demonstrate that making accurate predictions of human image quality judgements does not require a model that is supervisedly trained
  • Our model is consistent with the efficient coding and slowness principles formulated in computational neuroscience, and we demonstrate that our basic approach is not overly dependent on the implementation details of the computational architecture, such as the multi-scale transform or the precise neural network architecture
  • While we cannot claim that Perceptual Information Metric (PIM) is free of bias, by being completely unsupervised, one possible source of bias is removed
Results
  • The authors assess the properties of the unsupervised representation in four ways. First, the authors use PIM to make predictions on a dataset of human image quality ratings, previously collected under a two-alternative forced choice (2AFC) paradigm, comparing it to other recently proposed metrics on this task.
  • The authors posit that shifting an image by a small number of pixels should only have negligible effect on human perceptual decision making.
  • The authors quantify this on PIM and a variety of metrics.
  • The authors generalize this experiment to gather intuitions about the relative weighting of different types of distortions via the ImageNet-C dataset [17].
  • The authors assess the robustness of the approach using a number of ablation experiments
Conclusion
  • The authors demonstrate that making accurate predictions of human image quality judgements does not require a model that is supervisedly trained.
  • The authors' model is competitive with or exceeds the performance of a recent supervised model [36], while only relying on a few essential ingredients: a dataset of natural videos; a compressive objective function employed to extract temporally persistent information from the data; a distributional parameterization that is flexible enough to express uncertainty; a computational architecture that imposes spatial scale invariance, as well as spatial and temporal translation invariance.
  • While the authors cannot claim that PIM is free of bias, by being completely unsupervised, one possible source of bias is removed
Tables
  • Table1: Scores on BAPPS. Best values are underlined, the bold values are within 0.5% of the best. All numbers reported for LPIPS were computed using the code and weights provided by Zhang et al [<a class="ref-link" id="c36" href="#r36">36</a>]. Categories follow the same publication and “all” indicates overall scores
  • Table2: Score differences on pixel-shifted BAPPS. Bold values indicate best for a column. MS-SSIM and NLPD lose over 7 percentage points on BAPPS-2AFC when shifting by only 2 pixels, while the deep metrics (including PIM) have only a negligible decrease. On BAPPS-JND the effect is even more stark, where both traditional metrics’ scores decrease by almost 12 when shifting by 1 pixel, and 18 for 2 pixels. LPIPS’ scores show a more noticeable decrease on shifting by 2 pixels as well, 7 for LPIPS Alex and 3 for LPIPS Alex-lin. PIM performance decreases by less than one
Download tables as Excel
Related work
  • Early IQA metrics were based strictly on few hand-tunable parameters and architectural inductive biases informed by observations in visual psychology and/or computational models of early sensory neurons [e.g., 33, 28]. SSIM and its variants, perhaps the most popular descendant of this class [32, 31], define a quality index based on luminance, structure and contrast changes as multiplicative factors. FSIM [35] weights edge distortions by a bottom-up saliency measure. PSNR-HVS and variant metrics explicitly model contrast sensitivity and frequency masking [10, 23].

    Another member of this class of metrics, the Normalized Laplacian Pyramid Distance [NLPD; 20] has more similarities to our approach, in that an architectural bias – a multi-scale pyramid with divisive normalization – is imposed, and the parameters of the model (<100, much fewer than PIM) are fitted unsupervisedly to a dataset of natural images. However, the authors use a spatially predictive loss function, which is not explicitly information theoretic. While the model is demonstrated to reduce redundancy in the representation (i.e. implements efficient coding), it does not exploit temporal learning.
Reference
  • Alexander A. Alemi et al. “Deep Variational Information Bottleneck”. In: Proc. of 5th Int. Conf. on Learning Representations. 2017. URL: https://openreview.net/forum?id= HyxQzBceg.
    Locate open access versionFindings
  • Fred Attneave. “Some Informational Aspects of Visual Perception”. In: Psychological Review 61.3 (1954). DOI: 10.1037/h0054663.
    Locate open access versionFindings
  • Horace B. Barlow. “Possible Principles Underlying the Transformations of Sensory Messages”. In: Sensory Communication. Contributions to the Symposium on Principles of Sensory Communication. M.I.T. Press, 1961, pp. 217–234. ISBN: 978-0-262-51842-0.
    Google ScholarLocate open access versionFindings
  • Alex Berardino et al. “Eigen-Distortions of Hierarchical Representations”. In: Advances in Neural Information Processing Systems 30. 2017.
    Google ScholarLocate open access versionFindings
  • William Bialek, Ilya Nemenman, and Naftali Tishby. “Predictability, complexity, and learning”. In: Neural Computation 13.11 (2001). DOI: 10.1162/089976601753195969.
    Locate open access versionFindings
  • Peter J. Burt and Edward H. Adelson. “The Laplacian Pyramid as a Compact Image Code”. In: IEEE Transactions on Communications 31.4 (Apr. 1983). DOI: 10.1109/TCOM.1983. 1095851.
    Locate open access versionFindings
  • Troy Chinen et al. “Towards a Semantic Perceptual Image Metric”. In: 2018 25th IEEE International Conference on Image Processing (ICIP). 2018. DOI: 10.1109/ICIP.2018. 8451611.
    Locate open access versionFindings
  • Keyan Ding et al. Image Quality Assessment: Unifying Structure and Texture Similarity. cite arxiv:2004.07722020. URL: http://arxiv.org/abs/2004.07728.
    Findings
  • Keyan Ding et al. “Image Quality Assessment: Unifying Structure and Texture Similarity”. In: CoRR abs/2004.07728 (2020). URL: https://arxiv.org/abs/2004.07728.
    Findings
  • Karen O. Egiazarian et al. “A New Full-Reference Quality Metric Based On HVS”. In: CDROM Proceedings of the Second International Workshop on Video Processing and Quality Metrics. 2006.
    Google ScholarLocate open access versionFindings
  • David J. Field. “Relations Between the Statistics of Natural Images and the Response Properties of Cortical Cells”. In: Journal of the Optical Society of America A 4.12 (1987). DOI: 10.1364/ JOSAA.4.002379.
    Google ScholarFindings
  • Ian Fischer. “Bounding the Multivariate Mutual Information”. In: Information Theory and Machine Learning Workshop (2019). URL: https://drive.google.com/file/d/17lJiJ4v_6h0p-ist_jCrr-o1ODi7yELx/view.
    Locate open access versionFindings
  • Ian Fischer. “The Conditional Entropy Bottleneck”. In: arXiv preprint arXiv:2002.05379 (2020).
    Findings
  • Peter Földiák. “Learning Invariance from Transformation Sequences”. In: Neural Computation 3.2 (1991). DOI: 10.1162/neco.1991.3.2.194.
    Findings
  • Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. “Image Style Transfer Using Convolutional Neural Networks”. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), pp. 2414–2423.
    Google ScholarLocate open access versionFindings
  • David J. Heeger. “Normalization of cell responses in cat striate cortex”. In: Visual Neuroscience 9.2 (1992). DOI: 10.1017/S0952523800009640.
    Findings
  • Dan Hendrycks and Thomas Dietterich. “Benchmarking neural network robustness to common corruptions and perturbations”. In: arXiv preprint arXiv:1903.12261 (2019).
    Findings
  • Diederik P Kingma and Jimmy Ba. “Adam: A method for stochastic optimization”. In: arXiv preprint arXiv:1412.6980 (2014).
    Findings
  • Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. “ImageNet Classification with Deep Convolutional Neural Networks”. In: Advances in Neural Information Processing Systems 25. Ed. by F. Pereira et al. Curran Associates, Inc., 2012, pp. 1097–1105. URL: http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional neural-networks.pdf.
    Locate open access versionFindings
  • Valero Laparra et al. “Perceptual image quality assessment using a normalized Laplacian pyramid”. In: Proceedings of SPIE, Human Vision and Electronic Imaging XXI. 2016. DOI: 10.2352/ISSN.2470-1173.2016.16.HVEI-103.
    Locate open access versionFindings
  • Graeme Mitchison. “Removing Time Variation with the Anti-Hebbian Differential Synapse”. In: Neural Computation 3.3 (1991). DOI: 10.1162/neco.1991.3.3.312.
    Locate open access versionFindings
  • Aaron van den Oord, Yazhe Li, and Oriol Vinyals. “Representation learning with contrastive predictive coding”. In: arXiv preprint arXiv:1807.03748 (2018).
    Findings
  • Nikolay N. Ponomarenko et al. “On between-coefficient contrast masking of DCT basis functions”. In: CD-ROM Proc. of the Third International Workshop on Video Processing and Quality Metrics. 2007.
    Google ScholarLocate open access versionFindings
  • Ben Poole et al. “On Variational Bounds of Mutual Information”. In: ICML2019 (2019). URL: https://arxiv.org/abs/1905.06922.
    Findings
  • Olga Russakovsky et al. “ImageNet Large Scale Visual Recognition Challenge”. In: International Journal of Computer Vision (IJCV) 115.3 (2015), pp. 211–252. DOI: 10.1007/s11263015-0816-y.
    Locate open access versionFindings
  • Eero P. Simoncelli and William T. Freeman. “The Steerable Pyramid: A Flexible Architecture for Multi-Scale Derivative Computation”. In: 1995 IEEE International Conference on Image Processing (ICIP). Vol. 3. 1995. DOI: 10.1109/ICIP.1995.537667.
    Locate open access versionFindings
  • Karen Simonyan and Andrew Zisserman. Very Deep Convolutional Networks for Large-Scale Image Recognition. 2014. arXiv: 1409.1556 [cs.CV].
    Findings
  • Patrick C. Teo and David J. Heeger. “Perceptual image distortion”. In: Proc. SPIE 2179, Human Vision, Visual Processing, and Digital Display V. 1994. DOI: 10.1117/12.172664.
    Locate open access versionFindings
  • Floris L. Van Nes and Maarten A. Bouman. “Spatial Modulation Transfer in the Human Eye”. In: Journal of the Optical Society of America A 57.3 (1967). DOI: 10.1364/JOSA.57.000401.
    Findings
  • Zhou Wang and Eero P. Simoncelli. “Maximum differentiation (MAD) competition: A methodology for comparing computational models of perceptual quantities”. In: Journal of Vision 8.12 (2008). DOI: 10.1167/8.12.8.
    Locate open access versionFindings
  • Zhou Wang, Eero P. Simoncelli, and Alan Conrad Bovik. “Multi-Scale Structural Similarity for Image Quality Assessment”. In: Conf. Rec. of the 37th Asilomar Conf. on Signals, Systems and Computers. 2003. DOI: 10.1109/ACSSC.2003.1292216.
    Locate open access versionFindings
  • Zhou Wang et al. “Image Quality Assessment: From Error Visibility to Structural Similarity”. In: IEEE Transactions on Image Processing 13.4 (2004). DOI: 10.1109/TIP.2003.819861.
    Locate open access versionFindings
  • A. B. Watson. “DCTune: A Technique for Visual Optimization of DCT Quantization Matrices for Individual Images”. In: Society for Information Display Digest of Technical Papers 24 (1993), pp. 946–949.
    Google ScholarLocate open access versionFindings
  • Laurenz Wiskott. “Slow Feature Analysis: A Theoretical Analysis of Optimal Free Responses”. In: Neural Computation 15.9 (2003). DOI: 10.1162/089976603322297331.
    Locate open access versionFindings
  • L. Zhang et al. “FSIM: A Feature Similarity Index for Image Quality Assessment”. In: IEEE Transactions on Image Processing 20.8 (Aug. 2011), pp. 2378–2386. ISSN: 1057-7149. DOI: 10.1109/TIP.2011.2109730.
    Locate open access versionFindings
  • Richard Zhang et al. “The Unreasonable Effectiveness of Deep Features as a Perceptual Metric”. In: (2018). cite arxiv:1801.03924Comment: Code and data available at https://www.github.com/richzhang/PerceptualSimilarity. URL:http://arxiv.org/abs/1801.03924. We used 40 000 publicly available videos from YouTube which were available in a spatial resolution of at least 1920 × 1080 pixels. In an attempt not to skew the distribution of content too far from what may inform biological representation learning, we excluded most artificial content such as screenshots and videos of computer games. We decompressed one segment of 30 consecutive frames (corresponding to 1 second) out of each video, yielding a total of ca. 11 hours of training video. To reduce video compression artifacts and prevent systematic downsampling artifacts, each segment was then spatially downsampled to a randomized height between 128 and 160. Each segment was then separated into 15 pairs of neighboring frames, and a randomly placed, but spatially colocated patch of 64 × 64 pixels was cropped out of each frame pair. The order of the frame pairs was then randomized in a running buffer, and all RGB pixel values were normalized to the range between 0 and 1 before being fed into the model.
    Findings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科