How do Decisions Emerge across Layers in Neural Models? Interpretation with Differentiable Masking

De Cao Nicola
De Cao Nicola
Schlichtkrull Michael
Schlichtkrull Michael
Aziz Wilker
Aziz Wilker

EMNLP 2020, 2020.

Cited by: 0|Bibtex|Views34
Other Links: arxiv.org
Keywords:
san josesentiment classificationnlp modelDeep neural networkshindsight biasMore(11+)
Weibo:
To deal with these two challenges, we introduce Differentiable Masking

Abstract:

Attribution methods assess the contribution of inputs (e.g., words) to the model prediction. One way to do so is erasure: a subset of inputs is considered irrelevant if it can be removed without affecting the model prediction. Despite its conceptual simplicity, erasure is not commonly used in practice. First, the objective is generally ...More

Code:

Data:

0
Introduction
Highlights
  • Deep neural networks (DNN) have become standard tools in NLP demonstrating impressive im- Model (1) h (l) ( ) Gated input ̂ = ⊙

    Model with gated input ĥ ⋆ [ ̂‖ ] ̂

    provements over traditional approaches on many tasks (Goldberg, 2017)
  • We study post hoc interpretability where the goal is to explain the prediction of a trained model and to reveal how the model arrives at the decision
  • The Broncos practiced at Stanford University and stayed at the Santa Clara Marriott
  • The recent developments in expressivity and efficacy of complex deep neural networks have come at the expense of interpretability
  • While systematically erasing inputs to determine how a model reacts leads to a neat interpretation, it comes with many issues such as an exponential computational time complexity and susceptibility to the hindsight bias: if a word can be dropped from the input, it does not necessary implies that it is not used by the model
  • We have introduced a new post hoc interpretation method which learns to completely remove subsets of inputs or hidden states through masking
Methods
Results
  • The authors start with an example of input attributions, see Figure 3, which illustrates how DIFFMASK goes beyond input attribution as typically known.8 The attribution provided by erasure (Figure 3a) is not informative: the search in this case, and in all other examples in the test set, finds a single digit that is sufficient to maintain the original prediction and discards all the other inputs.
  • The perturbation methods by Schulz et al (2020) and Guan et al (2019) (Figure 3b and 3d) are over-aggressive in pruning.
  • They assign low attribution to some items in the query even though those had to be considered when making the prediction.
Conclusion
  • The recent developments in expressivity and efficacy of complex deep neural networks have come at the expense of interpretability.
  • The authors have introduced a new post hoc interpretation method which learns to completely remove subsets of inputs or hidden states through masking.
  • The authors circumvent an intractable search by learning an end-to-end differentiable prediction model.
  • To circumvent the hindsight bias problem, the authors probe the model’s hidden states at different depths and amortize predictions over the training set.
  • The authors' method sheds light on what different layers ‘know’ about the input and where information about the prediction is stored in different layers
Summary
  • Introduction:

    Their power and flexibility come at the expense of interpretability
  • This lack of interpretability can prevent users from trusting model predictions (Kim, 2015; Ribeiro et al, 2016), makes it hard to detect model or data deficiencies (Gururangan et al, 2018; Kaushik and Lipton, 2018) or verify that a model is fair and does not exhibit harmful biases (Sun et al, 2019; Holstein et al, 2019).
  • The Broncos practiced at Stanford University and stayed at the Santa Clara Marriott
  • Methods:

    Erasure Sundararajan et al (2017) Schulz et al (2020) Guan et al (2019) DIFFMASK DKL DJS.
  • Approaches The authors compare DIFFMASK to integrated gradient (Sundararajan et al, 2017), as one of the most widely used attribution methods, as well as the perturbation methods by Schulz et al (2020) and Guan et al (2019).
  • The authors perform erasure by searching exhaustively for masked inputs that yield the same prediction
  • Results:

    The authors start with an example of input attributions, see Figure 3, which illustrates how DIFFMASK goes beyond input attribution as typically known.8 The attribution provided by erasure (Figure 3a) is not informative: the search in this case, and in all other examples in the test set, finds a single digit that is sufficient to maintain the original prediction and discards all the other inputs.
  • The perturbation methods by Schulz et al (2020) and Guan et al (2019) (Figure 3b and 3d) are over-aggressive in pruning.
  • They assign low attribution to some items in the query even though those had to be considered when making the prediction.
  • Conclusion:

    The recent developments in expressivity and efficacy of complex deep neural networks have come at the expense of interpretability.
  • The authors have introduced a new post hoc interpretation method which learns to completely remove subsets of inputs or hidden states through masking.
  • The authors circumvent an intractable search by learning an end-to-end differentiable prediction model.
  • To circumvent the hindsight bias problem, the authors probe the model’s hidden states at different depths and amortize predictions over the training set.
  • The authors' method sheds light on what different layers ‘know’ about the input and where information about the prediction is stored in different layers
Tables
  • Table1: Toy task: average divergence in nats between the ground-truth attributions and those different methods assigned to hidden states in the validation set. *Erasure produces a delta distribution that does not share support with the ground-truth
  • Table2: Sentiment classification: optimization with DIFFMASK and REINFORCE (with a moving average baseline for variance reduction) not amortised against erasure exact search. All metrics are computed at token level where optimality is measured at sentence level
  • Table3: List of hyperparameters for the sentiment classification experiment. *is <a class="ref-link" id="cKingma_2015_a" href="#rKingma_2015_a">Kingma and Ba (2015</a>), **is <a class="ref-link" id="cTieleman_2012_a" href="#rTieleman_2012_a">Tieleman and Hinton (2012</a>); <a class="ref-link" id="cZhang_et+al_2019_a" href="#rZhang_et+al_2019_a">Zhang et al (2019</a>)
  • Table4: List of hyperparameters for the question answering experiment. *is <a class="ref-link" id="cKingma_2015_a" href="#rKingma_2015_a">Kingma and Ba (2015</a>), **is <a class="ref-link" id="cTieleman_2012_a" href="#rTieleman_2012_a">Tieleman and Hinton (2012</a>); <a class="ref-link" id="cZhang_et+al_2019_a" href="#rZhang_et+al_2019_a">Zhang et al (2019</a>)
Download tables as Excel
Related work
  • While we motivated our approach through its relation to erasure, an alternative way of looking at our approach is considering it as a perturbationbased method. This recently introduced class of attribution methods (Ying et al, 2019; Guan et al, 2019; Schulz et al, 2020; Taghanaki et al, 2019), instead of erasing input, injects noise. These methods can be regarded as continuous relaxations of erasure, though they are typically motivated from the information-theoretic perspective. The previous approaches use continuous gates which may be problematic when the magnitude of input changes or requires making (Gaussian) assumptions about the input distribution. This means that information about the input can still leak to the predictor. These methods are also, similarly to subset erasure, susceptible to hindsight bias. Our method uses mixed discrete-continuous gates, which can completely block the flow, and amortization to address both these issues. We compare to perturbation-based methods in our experiments.
Funding
  • This project is supported by SAP Innovation Center Network, ERC Starting Grant BroadSem (678254), the Dutch Organization for Scientific Research (NWO) VIDI 639.022.518, and the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825299 (Gourmet)
Reference
  • Yossi Adi, Einat Kermany, Yonatan Belinkov, Ofer Lavi, and Yoav Goldberg. 2017. Fine-grained analysis of sentence embeddings using auxiliary prediction tasks. Proceedings of ICLR.
    Google ScholarLocate open access versionFindings
  • Betty van Aken, Benjamin Winter, Alexander Loser, and Felix A Gers. 2019. How does bert answer questions? a layer-wise analysis of transformer representations. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pages 1823–1832.
    Google ScholarLocate open access versionFindings
  • Sebastian Bach, Alexander Binder, Gregoire Montavon, Frederick Klauschen, Klaus-Robert Muller, and Wojciech Samek. 2015. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS one, 10(7).
    Google ScholarLocate open access versionFindings
  • David Baehrens, Timon Schroeter, Stefan Harmeling, Motoaki Kawanabe, Katja Hansen, and KlausRobert MAzller. 2010. How to explain individual classification decisions. Journal of Machine Learning Research, 11(Jun):1803–1831.
    Google ScholarLocate open access versionFindings
  • Joost Bastings, Wilker Aziz, and Ivan Titov. 2019. Interpretable neural predictions with differentiable binary variables. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2963–2977, Florence, Italy. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Yonatan Belinkov and James Glass. 2019. Analysis methods in neural language processing: A survey. Transactions of the Association for Computational Linguistics, 7:49–72.
    Google ScholarLocate open access versionFindings
  • Yoshua Bengio, Nicholas Leonard, and Aaron Courville. 2013. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432.
    Findings
  • Stephen Boyd, Stephen P Boyd, and Lieven Vandenberghe. 2004. Convex optimization. Cambridge university press.
    Google ScholarFindings
  • Oana-Maria Camburu, Tim Rocktaschel, Thomas Lukasiewicz, and Phil Blunsom. 2018. e-snli: Natural language inference with natural language explanations. In Advances in Neural Information Processing Systems, pages 9539–9549.
    Google ScholarLocate open access versionFindings
  • Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1724– 1734, Doha, Qatar. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Jay DeYoung, Sarthak Jain, Nazneen Fatema Rajani, Eric Lehman, Caiming Xiong, Richard Socher, and Byron C Wallace. 2019. Eraser: A benchmark to evaluate rationalized nlp models. arXiv preprint arXiv:1911.03429.
    Findings
  • Shi Feng, Eric Wallace, II Grissom, Mohit Iyyer, Pedro Rodriguez, Jordan Boyd-Graber, et al. 2018. Pathologies of neural models make interpretations difficult. EMNLP.
    Google ScholarLocate open access versionFindings
  • Yoav Goldberg. 2017. Neural network methods for natural language processing. Synthesis Lectures on Human Language Technologies, 10(1):1–309.
    Google ScholarLocate open access versionFindings
  • Chaoyu Guan, Xiting Wang, Quanshi Zhang, Runjin Chen, Di He, and Xing Xie. 2019. Towards a deep and unified understanding of deep neural models in nlp. In International Conference on Machine Learning, pages 2454–2463.
    Google ScholarLocate open access versionFindings
  • Suchin Gururangan, Swabha Swayamdipta, Omer Levy, Roy Schwartz, Samuel R Bowman, and Noah A Smith. 2018. Annotation artifacts in natural language inference data. arXiv preprint arXiv:1803.02324.
    Findings
  • Yaru Hao, Li Dong, Furu Wei, and Ke Xu. 2019. Visualizing and understanding the effectiveness of BERT. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4143– 4152, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Kenneth Holstein, Jennifer Wortman Vaughan, Hal Daume III, Miro Dudik, and Hanna Wallach. 2019. Improving fairness in machine learning systems: What do industry practitioners need? In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, pages 1–16.
    Google ScholarLocate open access versionFindings
  • Alon Jacovi and Yoav Goldberg. 2020. Towards faithfully interpretable nlp systems: How should we define and evaluate faithfulness? arXiv preprint arXiv:2004.03685.
    Findings
  • Sarthak Jain and Byron C. Wallace. 20Attention is not Explanation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3543–3556, Minneapolis, Minnesota. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Eric Jang, Shixiang Gu, and Ben Poole. 2017. Categorical reparameterization with Gumbel-Softmax. International Conference on Learning Representations.
    Google ScholarFindings
  • Xisen Jin, Junyi Du, Zhongyu Wei, Xiangyang Xue, and Xiang Ren. 2020. Towards Hierarchical Importance Attribution: Explaining Compositional Semantics for Neural Sequence Models. International Conference on Learning Representations.
    Google ScholarFindings
  • Divyansh Kaushik and Zachary C Lipton. 2018. How much reading does reading comprehension require? a critical investigation of popular benchmarks. Proceedings of EMNLP.
    Google ScholarLocate open access versionFindings
  • Been Kim. 2015. Interactive and interpretable machine learning models for human machine collaboration. Ph.D. thesis, Massachusetts Institute of Technology.
    Google ScholarFindings
  • Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. International Conference for Learning Representations.
    Google ScholarFindings
  • Diederik P Kingma and Max Welling. 2014. Autoencoding variational bayes. Proceedings of the 2nd International Conference on Learning Representations (ICLR).
    Google ScholarLocate open access versionFindings
  • Olga Kovaleva, Alexey Romanov, Anna Rogers, and Anna Rumshisky. 2019. Revealing the dark secrets of BERT. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4365–4374, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Tao Lei, Regina Barzilay, and Tommi Jaakkola. 2016. Rationalizing neural predictions. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 107–117, Austin, Texas. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Jiwei Li, Will Monroe, and Dan Jurafsky. 2016. Understanding neural networks through representation erasure. arXiv preprint arXiv:1612.08220.
    Findings
  • Christos Louizos, Max Welling, and Diederik P Kingma. 2018. Learning Sparse Neural Networks through L0 Regularization. International Conference on Learning Representations (ICLR).
    Google ScholarFindings
  • Chris J Maddison, Andriy Mnih, and Yee Whye Teh. 2017. The concrete distribution: A continuous relaxation of discrete random variables. International Conference on Learning Representations (ICLR).
    Google ScholarLocate open access versionFindings
  • Paul Michel, Omer Levy, and Graham Neubig. 2019. Are sixteen heads really better than one? In Advances in Neural Information Processing Systems, pages 14014–14024.
    Google ScholarLocate open access versionFindings
  • W James Murdoch and Arthur Szlam. 2017. Automatic rule extraction from long short term memory networks. arXiv preprint arXiv:1702.02540.
    Findings
  • Weili Nie, Yang Zhang, and Ankit Patel. 2018. A theoretical explanation for perplexing behaviors of backpropagation-based visualizations. ICML.
    Google ScholarLocate open access versionFindings
  • Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392, Austin, Texas. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. 2014. Stochastic backpropagation and approximate inference in deep generative models. International Conference on Machine Learning (ICML).
    Google ScholarLocate open access versionFindings
  • Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144.
    Google ScholarLocate open access versionFindings
  • Tim Rocktaschel, Edward Grefenstette, Karl Moritz Hermann, Tomas Kocisky, and Phil Blunsom. 2015. Reasoning about entailment with neural attention. arXiv preprint arXiv:1509.06664.
    Findings
  • Anna Rogers, Olga Kovaleva, and Anna Rumshisky. 2020. A primer in bertology: What we know about how bert works. arXiv preprint arXiv:2002.12327.
    Findings
  • Karl Schulz, Leon Sixt, Federico Tombari, and Tim Landgraf. 2020. Restricting the flow: Information bottlenecks for attribution. In International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • Sofia Serrano and Noah A. Smith. 2019. Is attention interpretable? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2931–2951, Florence, Italy. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Pascal Sturmfels, Scott Lundberg, and Su-In Lee. 2020. Visualizing the impact of feature attribution baselines. Distill, 5(1):e22.
    Google ScholarLocate open access versionFindings
  • Tony Sun, Andrew Gaut, Shirlyn Tang, Yuxin Huang, Mai ElSherief, Jieyu Zhao, Diba Mirza, Elizabeth Belding, Kai-Wei Chang, and William Yang Wang. 2019. Mitigating gender bias in natural language processing: Literature review. arXiv preprint arXiv:1906.08976.
    Findings
  • Mukund Sundararajan, Ankur Taly, and Qiqi Yan. 2017. Axiomatic attribution for deep networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 3319–3328. JMLR. org.
    Google ScholarLocate open access versionFindings
  • Saeid Asgari Taghanaki, Mohammad Havaei, Tess Berthier, Francis Dutil, Lisa Di Jorio, Ghassan Hamarneh, and Yoshua Bengio. 2019. Infomask: Masked variational latent representation to localize chest disease. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 739–747. Springer.
    Google ScholarLocate open access versionFindings
  • Ian Tenney, Patrick Xia, Berlin Chen, Alex Wang, Adam Poliak, R Thomas McCoy, Najoung Kim, Benjamin Van Durme, Samuel R Bowman, Dipanjan Das, et al. 2019. What do you learn from context? probing for sentence structure in contextualized word representations. arXiv preprint arXiv:1905.06316.
    Findings
  • Lloyd S Shapley. 1953. A value for n-person games. Contributions to the Theory of Games, 2(28):307– 317.
    Google ScholarLocate open access versionFindings
  • Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. 2017. Learning important features through propagating activation differences. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 3145–3153. JMLR. org.
    Google ScholarLocate open access versionFindings
  • Tijmen Tieleman and Geoffrey Hinton. 2012. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning, 4(2):26–31.
    Google ScholarLocate open access versionFindings
  • Shikhar Vashishth, Shyam Upadhyay, Gaurav Singh Tomar, and Manaal Faruqui. 2019. Attention interpretability across NLP tasks. arXiv preprint arXiv:1909.11218.
    Findings
  • Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2013. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034.
    Findings
  • Chandan Singh, W James Murdoch, and Bin Yu. 2018. Hierarchical interpretations for neural network predictions. arXiv preprint arXiv:1806.05337.
    Findings
  • Leon Sixt, Maximilian Granz, and Tim Landgraf. 2019. When explanations lie: Why many modified bp attributions fail. arXiv, pages arXiv–1912.
    Google ScholarFindings
  • Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1631–1642, Seattle, Washington, USA. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Elena Voita, Rico Sennrich, and Ivan Titov. 2019a. The bottom-up evolution of representations in the transformer: A study with machine translation and language modeling objectives. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4396–4406, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Elena Voita, David Talbot, Fedor Moiseev, Rico Sennrich, and Ivan Titov. 2019b. Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5797–5808, Florence, Italy. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Ronald J Williams. 1992. Simple statistical gradientfollowing algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229–256.
    Google ScholarLocate open access versionFindings
  • Thomas Wolf, L Debut, V Sanh, J Chaumond, C Delangue, A Moi, P Cistac, T Rault, R Louf, M Funtowicz, et al. 2019. Huggingfaces transformers: State-of-the-art natural language processing. ArXiv, abs/1910.03771.
    Findings
  • Zhitao Ying, Dylan Bourgeois, Jiaxuan You, Marinka Zitnik, and Jure Leskovec. 2019. GNNExplainer: Generating explanations for graph neural networks. In Advances in Neural Information Processing Systems, pages 9240–9251.
    Google ScholarLocate open access versionFindings
  • Michael Zhang, James Lucas, Jimmy Ba, and Geoffrey E Hinton. 2019. Lookahead optimizer: k steps forward, 1 step back. In Advances in Neural Information Processing Systems, pages 9593–9604.
    Google ScholarLocate open access versionFindings
  • Luisa M Zintgraf, Taco S Cohen, Tameem Adel, and Max Welling. 2017. Visualizing deep neural network decisions: Prediction difference analysis. ICLR.
    Google ScholarLocate open access versionFindings
  • z = min (1, max (0, s · (l − r) + r)), where σ is the Sigmoid function σ(x) = (1 + e−x)−1 and u ∼ U (0, 1). We point to the Appendix B of Louizos et al. (2018) for more informap(z) p(z)
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments