Understanding the role of individual units in a deep neural network

Proceedings of the National Academy of Sciences of the United States of America, pp. 30071-30078, 2020.

Cited by: 0|Bibtex|Views115|DOI:https://doi.org/10.1073/pnas.1907375117
WOS
Other Links: pubmed.ncbi.nlm.nih.gov|arxiv.org|academic.microsoft.com
Weibo:
The units reveal how the network decomposes the recognition of specific scene classes into particular visual concepts that are important to each scene class

Abstract:

Deep neural networks excel at finding hierarchical representations that solve complex tasks over large datasets. How can we humans understand these learned representations? In this work, we present network dissection, an analytic framework to systematically identify the semantics of individual hidden units within image classification and ...More

Code:

Data:

0
Introduction
  • Deep neural networks excel at finding hierarchical representations that solve complex tasks over large data sets.
  • When a network contains a unit that activates on trees, the authors wish to understand if it is a spurious correlation, or if the unit has a causal role that reveals how the network models its higher-level notions about trees
  • To investigate these questions, the authors introduce network dissection [9, 10], the method for systematically mapping the semantic concepts found within a deep convolutional neural network.
  • When trained to classify or generate natural scene images, both types of networks learn individual units that match the visual concept of a ‘tree’ even though the authors have never taught the network the tree concept during training
Highlights
  • Deep neural networks excel at finding hierarchical representations that solve complex tasks over large data sets
  • Can the individual hidden units of a deep network teach us how the network solves a complex task? Intriguingly, within state-of-theart deep networks, it has been observed that many single units match human-interpretable concepts that were not explicitly taught to the network: units have been found to detect objects, parts, textures, tense, gender, context, and sentiment [1,2,3,4,5,6,7]
  • We introduce network dissection [9, 10], our method for systematically mapping the semantic concepts found within a deep convolutional neural network
  • The units reveal how the network decomposes the recognition of specific scene classes into particular visual concepts that are important to each scene class
  • The behavior of the units reveals contextual relationships that the model enforces between classes of objects in a scene
  • The network achieves classification accuracy of 53.3% on the held-out validation set
  • Network dissection relies on the emergence of disentangled, human-interpretable units during training
Methods
  • Places365 [53] consists of 1.80 million photographic images, each labeled with one of 365 scene classes.
  • The dataset includes 36,500 labeled validation images (100 per class) that are not used for training.
  • Imagenet [35] consists of 1.28 million photographic images, each focused on a single main object and labeled with one of 1,000 object classes.
  • LSUN is a dataset with a large number of 256×256 images in a few classes [42].
  • Recognizable people in dataset images have been anonymized by pixelating faces in visualizations
Results
  • Figure 2b shows that removing just these 4 units reduces the network’s accuracy at discriminating ‘ski resort’ scenes from 81.4% to 64.0%, and removing the 20 most important units in conv5_3 reduces class accuracy further to 53.5%, near chance levels, even though classification accuracy over all scene classes is hardly affected.
  • Removing so many units damages the ability of the network to classify other scene classes: removing the 492 least-important units reduces all-class accuracy to 2.1%.
  • Removing the 20 most important conv5_3 units for each class reduces single-class accuracy to 53.0% on average, near chance units 14 objects.
  • Removing the 492 least important units only reduces single-class accuracy by an average of 3.6%, just a slight reduction.
  • The network achieves classification accuracy of 53.3% on the held-out validation set
Conclusion
  • Simple measures of performance, such as classification accuracy, do not reveal how a network solves its task: good performance can be achieved by networks that have differing sensitivities to shapes, textures, or perturbations [34, 48].

    To develop an improved understanding of how a network works, the authors have presented a way to analyze the roles of individual network units.
  • The units reveal how the network decomposes the recognition of specific scene classes into particular visual concepts that are important to each scene class.
  • The behavior of the units reveals contextual relationships that the model enforces between classes of objects in a scene.
  • Network dissection relies on the emergence of disentangled, human-interpretable units during training.
  • The authors have seen that many such interpretable units appear in state-of-the-art models, both supervised and unsupervised.
  • How to train better disentangled models is an open problem that is the subject of ongoing efforts [49,50,51,52]
Summary
  • Introduction:

    Deep neural networks excel at finding hierarchical representations that solve complex tasks over large data sets.
  • When a network contains a unit that activates on trees, the authors wish to understand if it is a spurious correlation, or if the unit has a causal role that reveals how the network models its higher-level notions about trees
  • To investigate these questions, the authors introduce network dissection [9, 10], the method for systematically mapping the semantic concepts found within a deep convolutional neural network.
  • When trained to classify or generate natural scene images, both types of networks learn individual units that match the visual concept of a ‘tree’ even though the authors have never taught the network the tree concept during training
  • Methods:

    Places365 [53] consists of 1.80 million photographic images, each labeled with one of 365 scene classes.
  • The dataset includes 36,500 labeled validation images (100 per class) that are not used for training.
  • Imagenet [35] consists of 1.28 million photographic images, each focused on a single main object and labeled with one of 1,000 object classes.
  • LSUN is a dataset with a large number of 256×256 images in a few classes [42].
  • Recognizable people in dataset images have been anonymized by pixelating faces in visualizations
  • Results:

    Figure 2b shows that removing just these 4 units reduces the network’s accuracy at discriminating ‘ski resort’ scenes from 81.4% to 64.0%, and removing the 20 most important units in conv5_3 reduces class accuracy further to 53.5%, near chance levels, even though classification accuracy over all scene classes is hardly affected.
  • Removing so many units damages the ability of the network to classify other scene classes: removing the 492 least-important units reduces all-class accuracy to 2.1%.
  • Removing the 20 most important conv5_3 units for each class reduces single-class accuracy to 53.0% on average, near chance units 14 objects.
  • Removing the 492 least important units only reduces single-class accuracy by an average of 3.6%, just a slight reduction.
  • The network achieves classification accuracy of 53.3% on the held-out validation set
  • Conclusion:

    Simple measures of performance, such as classification accuracy, do not reveal how a network solves its task: good performance can be achieved by networks that have differing sensitivities to shapes, textures, or perturbations [34, 48].

    To develop an improved understanding of how a network works, the authors have presented a way to analyze the roles of individual network units.
  • The units reveal how the network decomposes the recognition of specific scene classes into particular visual concepts that are important to each scene class.
  • The behavior of the units reveals contextual relationships that the model enforces between classes of objects in a scene.
  • Network dissection relies on the emergence of disentangled, human-interpretable units during training.
  • The authors have seen that many such interpretable units appear in state-of-the-art models, both supervised and unsupervised.
  • How to train better disentangled models is an open problem that is the subject of ongoing efforts [49,50,51,52]
Funding
  • And we are grateful for the support of the MIT-IBM Watson AI Lab, the DARPA XAI program FA875018-C0004, NSF 1524817 on Advancing Visual Recognition with Feature Visualizations, NSF BIGDATA 1447476, Grant RTI2018095232-B-C22 from the Spanish Ministry of Science, Innovation and Universities to AL, Early Career Scheme (ECS) of Hong Kong (No.24206219) to BZ, and a hardware donation from NVIDIA. ICLR. 7 Radford A, Jozefowicz R, Sutskever I (2017) Learning to generate reviews and discovering sentiment. arXiv preprint arXiv:1704.01444. 8 Bengio Y, Courville A, Vincent P (2013) Representation learning: A review and new perspectives
Study subjects and analysis
cases: 4
Figure 4c shows the effect of applying this procedure to activate 20 door units at two different locations in two generated images. Although the same intervention is applied to all four cases, the doors obtained in each situation is different: In cases 1-3, the newly synthesized door has a size, style, and location that is appropriate to the scene context. In case 4, where door units are activated on a tree, no new door is added to the image

Reference
  • 1 Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2015) Object detectors emerge in deep scene cnns in ICLR.
    Google ScholarFindings
  • 2 Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks in ECCV. 3 Mahendran A, Vedaldi A (2015) Understanding deep image representations by inverting them in CVPR. 4 Olah C, et al. (2018) The building blocks of interpretability. Distill 3(3):e10.
    Google ScholarLocate open access versionFindings
  • 5 Bau A, et al. (2018) Identifying and controlling important neurons in neural machine translation in NeurIPS. 6 Karpathy A, Johnson J, Fei-Fei L (2016) Visualizing and understanding recurrent networks in
    Google ScholarLocate open access versionFindings
  • 8 Bengio Y, Courville A, Vincent P (2013) Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35(8):1798–1828.
    Google ScholarLocate open access versionFindings
  • 9 Zhou B, Bau D, Oliva A, Torralba A (2018) Interpreting deep visual representations via network dissection. PAMI. 10 Bau D, et al. (2019) Gan dissection: Visualizing and understanding generative adversarial networks in ICLR. 11 LeCun Y, Bengio Y,, et al. (1995) Convolutional networks for images, speech, and time series.
    Google ScholarLocate open access versionFindings
  • 12 Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks in NeurIPS. pp. 1097–1105.
    Google ScholarFindings
  • 13 Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition in ICLR. 14 Goodfellow I, et al. (2014) Generative adversarial nets in NeurIPS. 15 Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: A neural image caption generator in CVPR. pp. 3156–3164.
    Google ScholarLocate open access versionFindings
  • 16 He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition in CVPR. 17 Isola P, Zhu JY, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks in CVPR. 18 Zhu JY, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycleconsistent adversarial networks in ICCV. 19 Karras T, Aila T, Laine S, Lehtinen J (2018) Progressive growing of gans for improved quality, stability, and variation in ICLR. 20 Bach S, et al. (2015) On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS one 10(7). 21 Zhou T, Krahenbuhl P, Aubry M, Huang Q, Efros AA (2016) Learning dense correspondence via 3d-guided cycle consistency in CVPR. 22 Fong RC, Vedaldi A (2017) Interpretable explanations of black boxes by meaningful perturbation in ICCV. pp. 3429–3437.
    Google ScholarLocate open access versionFindings
  • 23 Lundberg SM, Lee SI (2017) A unified approach to interpreting model predictions in NeurIPS. 24 Selvaraju RR, et al. (2017) Grad-cam: Visual explanations from deep networks via gradientbased localization. in ICCV. 25 Sundararajan M, Taly A, Yan Q (2017) Axiomatic attribution for deep networks in ICML. 26 Smilkov D, Thorat N, Kim B, Viégas F, Wattenberg M (2017) Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825.
    Findings
  • 27 Petsiuk V, Das A, Saenko K (2018) Rise: Randomized input sampling for explanation of blackbox models in BMVC. 28 Ribeiro MT, Singh S, Guestrin C (2016) Why should i trust you?: Explaining the predictions of any classifier in SIGKDD. (ACM), pp. 1135–1144.
    Google ScholarLocate open access versionFindings
  • 29 Kim B, Gilmer J, Viegas F, Erlingsson U, Wattenberg M (2017) Tcav: Relative concept importance testing with linear concept activation vectors. arXiv preprint arXiv:1711.11279.
    Findings
  • 30 Koul A, Fern A, Greydanus S (2019) Learning finite state representations of recurrent policy networks in ICLR. 31 Hendricks LA, et al. (2016) Generating visual explanations in ECCV. (Springer), pp. 3–19.
    Google ScholarLocate open access versionFindings
  • 32 Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database in NeurIPS. 33 Xiao T, Liu Y, Zhou B, Jiang Y, Sun J (2018) Unified perceptual parsing for scene understanding in ECCV. 34 Geirhos R, et al. (2018) Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. arXiv preprint arXiv:1811.12231.
    Findings
  • 35 Deng J, et al. (2009) Imagenet: A large-scale hierarchical image database in CVPR.
    Google ScholarFindings
  • 36 Wen W, Wu C, Wang Y, Chen Y, Li H (2016) Learning structured sparsity in deep neural networks in NeurIPS. pp. 2074–2082.
    Google ScholarFindings
  • 37 Li H, Kadav A, Durdanovic I, Samet H, Graf HP (2017) Pruning filters for efficient convnets in ICLR.
    Google ScholarFindings
  • 38 Morcos AS, Barrett DG, Rabinowitz NC, Botvinick M (2018) On the importance of single directions for generalization. arXiv preprint arXiv:1803.06959.
    Findings
  • 39 Zhou B, Sun Y, Bau D, Torralba A (2018) Revisiting the importance of individual units in cnns via ablation. arXiv preprint arXiv:1806.02891.
    Findings
  • 40 Radford A, Metz L, Chintala S (2016) Unsupervised representation learning with deep convolutional generative adversarial networks in ICLR.
    Google ScholarFindings
  • 41 Zhu JY, Krähenbühl P, Shechtman E, Efros AA (2016) Generative visual manipulation on the natural image manifold in ECCV.
    Google ScholarFindings
  • 42 Yu F, et al. (2015) Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365.
    Findings
  • 43 Szegedy C, et al. (2014) Intriguing properties of neural networks in ICLR. 44 Goodfellow IJ, Shlens J, Szegedy C (2015) Explaining and harnessing adversarial examples in ICLR. 45 Carlini N, Wagner D (2017) Towards evaluating the robustness of neural networks in 2017
    Google ScholarLocate open access versionFindings
  • 46 Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A (2018) Towards deep learning models resistant to adversarial attacks in ICLR. 47 Rauber J, Brendel W, Bethge M (2017) Foolbox: A python toolbox to benchmark the robustness of machine learning models. arXiv preprint arXiv:1707.04131.
    Findings
  • 48 Ilyas A, et al. (2019) Adversarial examples are not bugs, they are features. arXiv preprint arXiv:1905.02175.
    Findings
  • 49 Chen X, et al. (2016) Infogan: Interpretable representation learning by information maximizing generative adversarial nets in NeurIPS. pp. 2172–2180.
    Google ScholarFindings
  • 50 Higgins I, et al. (2017) beta-vae: Learning basic visual concepts with a constrained variational framework. ICLR 2(5):6.
    Google ScholarLocate open access versionFindings
  • 51 Zhang Q, Nian Wu Y, Zhu SC (2018) Interpretable convolutional neural networks in CVPR. 52 Achille A, Soatto S (2018) Emergence of invariance and disentanglement in deep representations. JMLR 19(1):1947–1980.
    Google ScholarLocate open access versionFindings
  • 53 Zhou B, et al. (2017) Scene parsing through ade20k dataset in CVPR. 54 Van De Weijer J, Schmid C, Verbeek J, Larlus D (2009) Learning color names for real-world applications. IEEE Transactions on Image Processing 18(7):1512–1523.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments