Towards Faithfully Interpretable NLP Systems: How should we define and evaluate faithfulness?

ACL, pp. 4198-4205, 2020.

Cited by: 35|Views86
EI
Weibo:
The opinion proposed in this paper is two-fold: First, interpretability evaluation often conflates evaluating faithfulness and plausibility together

Abstract:

With the growing popularity of deep-learning based NLP models, comes a need for interpretable systems. But what is interpretability, and what constitutes a high-quality interpretation? In this opinion piece we reflect on the current state of interpretability evaluation research. We call for more clearly differentiating between different...More

Code:

Data:

Full Text
Bibtex
Weibo
Introduction
  • Fueled by recent advances in deep-learning and language processing, NLP systems are increasingly being used for prediction and decision-making in many fields (Vig and Belinkov, 2019), including sensitive ones such as health, commerce and law (Fort and Couillault, 2016)
  • These highly flexible and highly effective neural models are opaque.
  • Current approaches define interpretation in a rather ad-hoc manner, motivated by practical usecases and applications
  • This view often fails to distinguish between distinct aspects of the interpretation’s quality, such as readability, plausibility and faithfulness (Herman, 2017).2.
  • This view often fails to distinguish between distinct aspects of the interpretation’s quality, such as readability, plausibility and faithfulness (Herman, 2017).2 The authors argue (§2, §5) such conflation is harmful, and that faithfulness should be defined and evaluated explicitly, and independently from plausibility
Highlights
  • Fueled by recent advances in deep-learning and language processing, NLP systems are increasingly being used for prediction and decision-making in many fields (Vig and Belinkov, 2019), including sensitive ones such as health, commerce and law (Fort and Couillault, 2016)
  • In the context of faithfulness, we must warn against HumanComputer Interaction-inspired evaluation, as well: increased performance in this setting is not indicative of faithfulness; rather, it is indicative of correlation between the plausibility of the explanations and the model’s performance
  • Consider the following fictional case of a non-faithful explanation system, in an HumanComputer Interaction evaluation setting: the explanation given is a heat-map of the textual input, attributing scores to various tokens
  • The opinion proposed in this paper is two-fold: First, interpretability evaluation often conflates evaluating faithfulness and plausibility together
Results
  • While explanations have many different use-cases, such as model debugging, lawful guarantees or health-critical guarantees, one other possible usecase with prominent evaluation literature is Intelligent User Interfaces (IUI), via HumanComputer Interaction (HCI), of automatic models assisting human decision-makers.
  • In this case, the goal of the explanation is to increase the degree of trust between the user and the system, giving the user more nuance towards whether the system’s decision is likely correct, or not.
  • While the system is concretely useful, the claims given by the explanation do not reflect the model’s decisions whatsoever
Conclusion
  • The opinion proposed in this paper is two-fold: First, interpretability evaluation often conflates evaluating faithfulness and plausibility together.
  • The authors should tease apart the two definitions and focus solely on evaluating faithfulness without any supervision or influence of the convincing power of the interpretation.
  • Faithfulness is often evaluated in a binary “faithful or not faithful” manner, and the authors believe strictly faithful interpretation is a “unicorn” which will likely never be found.
  • The authors should instead evaluate faithfulness on a more nuanced “grayscale” that allows interpretations to be useful even if they are not globally and definitively faithful.
Summary
  • Introduction:

    Fueled by recent advances in deep-learning and language processing, NLP systems are increasingly being used for prediction and decision-making in many fields (Vig and Belinkov, 2019), including sensitive ones such as health, commerce and law (Fort and Couillault, 2016)
  • These highly flexible and highly effective neural models are opaque.
  • Current approaches define interpretation in a rather ad-hoc manner, motivated by practical usecases and applications
  • This view often fails to distinguish between distinct aspects of the interpretation’s quality, such as readability, plausibility and faithfulness (Herman, 2017).2.
  • This view often fails to distinguish between distinct aspects of the interpretation’s quality, such as readability, plausibility and faithfulness (Herman, 2017).2 The authors argue (§2, §5) such conflation is harmful, and that faithfulness should be defined and evaluated explicitly, and independently from plausibility
  • Results:

    While explanations have many different use-cases, such as model debugging, lawful guarantees or health-critical guarantees, one other possible usecase with prominent evaluation literature is Intelligent User Interfaces (IUI), via HumanComputer Interaction (HCI), of automatic models assisting human decision-makers.
  • In this case, the goal of the explanation is to increase the degree of trust between the user and the system, giving the user more nuance towards whether the system’s decision is likely correct, or not.
  • While the system is concretely useful, the claims given by the explanation do not reflect the model’s decisions whatsoever
  • Conclusion:

    The opinion proposed in this paper is two-fold: First, interpretability evaluation often conflates evaluating faithfulness and plausibility together.
  • The authors should tease apart the two definitions and focus solely on evaluating faithfulness without any supervision or influence of the convincing power of the interpretation.
  • Faithfulness is often evaluated in a binary “faithful or not faithful” manner, and the authors believe strictly faithful interpretation is a “unicorn” which will likely never be found.
  • The authors should instead evaluate faithfulness on a more nuanced “grayscale” that allows interpretations to be useful even if they are not globally and definitively faithful.
Funding
  • We also thank the reviewers for additional feedback and pointing to relevant literature in HCI and IUI. This project has received funding from the Europoean Research Council (ERC) under the Europoean Union’s Horizon 2020 research and innovation programme, grant agreement No 802774 (iEXTRACT)
Reference
  • Ashraf M. Abdul, Jo Vermeulen, Danding Wang, Brian Y. Lim, and Mohan S. Kankanhalli. 2018. Trends and trajectories for explainable, accountable and intelligible systems: An HCI research agenda. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, CHI 2018, Montreal, QC, Canada, April 21-26, 2018, page 582. ACM.
    Google ScholarLocate open access versionFindings
  • David Alvarez-Melis and Tommi S. Jaakkola. 2018. On the robustness of interpretability methods. CoRR, abs/1806.08049.
    Findings
  • Leila Arras, Franziska Horn, Gregoire Montavon, Klaus-Robert Muller, and Wojciech Samek. 2016. ”what is relevant in a text document?”: An interpretable machine learning approach. CoRR, abs/1612.07843.
    Findings
  • Joris Baan, Maartje ter Hoeve, Marlies van der Wees, Anne Schuth, and Maarten de Rijke. 2019. Do transformer attention heads provide transparency in abstractive summarization? CoRR, abs/1907.00570.
    Findings
  • Przemyslaw Biecek. 2018. DALEX: explainers for complex predictive models in R. J. Mach. Learn. Res., 19:84:1–84:5.
    Google ScholarLocate open access versionFindings
  • Oana-Maria Camburu, Eleonora Giunchiglia, Jakob Foerster, Thomas Lukasiewicz, and Phil Blunsom. 2019. Can i trust the explainer? verifying post-hoc explanatory methods.
    Google ScholarFindings
  • Vicente Ivan Sanchez Carmona, Tim Rocktaschel, Sebastian Riedel, and Sameer Singh. 2015. Towards extracting faithful and descriptive representations of latent variable models. In AAAI Spring Symposia.
    Google ScholarFindings
  • Supriyo Chakraborty, Richard Tomsett, Ramya Raghavendra, Daniel Harborne, Moustafa Alzantot, Federico Cerutti, Mani Srivastava, Alun Preece, Simon Julier, Raghuveer M Rao, et al. 2017. Interpretability of deep learning models: a survey of results. In 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), pages 1–6. IEEE.
    Google ScholarLocate open access versionFindings
  • Jay DeYoung, Sarthak Jain, Nazneen Fatema Rajani, Eric Lehman, Caiming Xiong, Richard Socher, and Byron C. Wallace. 201Eraser: A benchmark to evaluate rationalized nlp models.
    Google ScholarFindings
  • Finale Doshi-Velez and Been Kim. 2017. Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608.
    Findings
  • Shi Feng and Jordan Boyd-Graber. 2019. What can ai do for me? evaluating machine learning interpretations in cooperative play. In Proceedings of the 24th International Conference on Intelligent User Interfaces, IUI 19, page 229239, New York, NY, USA. Association for Computing Machinery.
    Google ScholarLocate open access versionFindings
  • Shi Feng, Eric Wallace, Alvin Grissom II, Mohit Iyyer, Pedro Rodriguez, and Jordan L. Boyd-Graber. 2018. Pathologies of neural models make interpretation difficult. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, pages 3719–3728. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Karen Fort and Alain Couillault. 2016.
    Google ScholarFindings
  • Reza Ghaeini, Xiaoli Z. Fern, and Prasad Tadepalli. 2018. Interpreting recurrent and attention-based neural models: a case study on natural language inference. CoRR, abs/1808.03894.
    Findings
  • Amirata Ghorbani, Abubakar Abid, and James Zou. 2019. Interpretation of neural networks is fragile. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 3681–3688.
    Google ScholarLocate open access versionFindings
  • Leilani H Gilpin, David Bau, Ben Z Yuan, Ayesha Bajwa, Michael Specter, and Lalana Kagal. 2018. Explaining explanations: An overview of interpretability of machine learning. In 2018 IEEE 5th International Conference on data science and advanced analytics (DSAA), pages 80–89. IEEE.
    Google ScholarLocate open access versionFindings
  • Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi. 2018. A survey of methods for explaining black box models. ACM Comput. Surv., 51(5):93:1–93:42.
    Google ScholarLocate open access versionFindings
  • L.A. Harrington, M.D. Morley, A. Scedrov, and S.G. Simpson. 1985. Harvey Friedman’s Research on the Foundations of Mathematics. Studies in Logic and the Foundations of Mathematics. Elsevier Science.
    Google ScholarLocate open access versionFindings
  • Bernease Herman. 2017. The promise and peril of human evaluation for model interpretability. CoRR, abs/1711.07414. Withdrawn.
    Findings
  • Alon Jacovi, Oren Sar Shalom, and Yoav Goldberg. 2018. Understanding convolutional neural networks for text classification. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 56–65.
    Google ScholarLocate open access versionFindings
  • Sarthak Jain and Byron C. Wallace. 2019. Attention is not explanation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pages 3543–3556. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Harmanpreet Kaur, Harsha Nori, Samuel Jenkins, Rich Caruana, Hanna M. Wallach, and Jennifer Wortman Vaughan. 2019. Interpreting interpretability: Understanding data scientists use of interpretability tools for machine learning.
    Google ScholarFindings
  • Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, and Rory Sayres. 2017. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav).
    Google ScholarFindings
  • Pieter-Jan Kindermans, Sara Hooker, Julius Adebayo, Maximilian Alber, Kristof T. Schutt, Sven Dahne, Dumitru Erhan, and Been Kim. 2019. The (un)reliability of saliency methods. In Wojciech Samek, Gregoire Montavon, Andrea Vedaldi, Lars Kai Hansen, and Klaus-Robert Muller, editors, Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, volume 11700 of Lecture Notes in Computer Science, pages 267–280. Springer.
    Google ScholarLocate open access versionFindings
  • Isaac Lage, Emily Chen, Jeffrey He, Menaka Narayanan, Been Kim, Sam Gershman, and Finale Doshi-Velez. 2019. An evaluation of the human-interpretability of explanation. CoRR, abs/1902.00006.
    Findings
  • Himabindu Lakkaraju, Ece Kamar, Rich Caruana, and Jure Leskovec. 2019. Faithful and customizable explanations of black box models. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, AIES 2019, Honolulu, HI, USA, January 2728, 2019, pages 131–138. ACM.
    Google ScholarLocate open access versionFindings
  • Jaesong Lee, Joong-Hwi Shin, and Jun-Seok Kim. 2017. Interactive visualization and manipulation of attention-based neural machine translation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 121–126, Copenhagen, Denmark. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Zachary C. Lipton. 2018. The mythos of model interpretability. Commun. ACM, 61(10):36–43.
    Google ScholarLocate open access versionFindings
  • Scott M. Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems
    Google ScholarLocate open access versionFindings
  • 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pages 4765–4774.
    Google ScholarLocate open access versionFindings
  • Sina Mohseni and Eric D. Ragan. 2018. A humangrounded evaluation benchmark for local explanations of machine learning. CoRR, abs/1801.05075.
    Findings
  • W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. 2019. Interpretable machine learning: definitions, methods, and applications. ArXiv, abs/1901.04592.
    Findings
  • Dong Nguyen. 2018. Comparing automatic and human evaluation of local explanations for text classification. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1069– 1078, New Orleans, Louisiana. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Nina Porner, Hinrich Schutze, and Benjamin Roth. 2018. Evaluating neural network explanation methods using hybrid documents and morphological prediction. CoRR, abs/1801.06422.
    Findings
  • Danish Pruthi, Mansi Gupta, Bhuwan Dhingra, Graham Neubig, and Zachary C. Lipton. 2019. Learning to deceive with attention-based explanations. CoRR, abs/1909.07913.
    Findings
  • Nazneen Fatema Rajani, Bryan McCann, Caiming Xiong, and Richard Socher. 2019. Explain yourself! leveraging language models for commonsense reasoning. CoRR, abs/1906.02361.
    Findings
  • Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. “Why Should I Trust You?”: Explaining the predictions of any classifier. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, pages 1135–1144, New York, NY, USA. ACM.
    Google ScholarLocate open access versionFindings
  • Cynthia Rudin. 2018. Please stop explaining black box models for high stakes decisions. CoRR, abs/1811.10154.
    Findings
  • Sofia Serrano and Noah A. Smith. 2019. Is attention interpretable? In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, pages 2931–2951.
    Google ScholarLocate open access versionFindings
  • Julia Strout, Ye Zhang, and Raymond J. Mooney. 2019. Do human rationales improve machine explanations? CoRR, abs/1905.13714.
    Findings
  • Madhumita Sushil, Simon Suster, and Walter Daelemans. 2018. Rule induction for global explanation of trained models. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 82–97, Brussels, Belgium. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Shikhar Vashishth, Shyam Upadhyay, Gaurav Singh Tomar, and Manaal Faruqui. 2019. Attention interpretability across NLP tasks. CoRR, abs/1909.11218.
    Findings
  • Jesse Vig and Yonatan Belinkov. 2019. Analyzing the structure of attention in a transformer language model. CoRR, abs/1906.04284.
    Findings
  • Hilde J. P. Weerts, Werner van Ipenburg, and Mykola Pechenizkiy. 2019. A human-grounded evaluation of SHAP for alert processing. CoRR, abs/1907.03324.
    Findings
  • Sarah Wiegreffe and Yuval Pinter. 2019. Attention is not not explanation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLPIJCNLP 2019, Hong Kong, China, November 3-7, 2019, pages 11–20. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Lior Wolf, Tomer Galanti, and Tamir Hazan. 2019. A formal approach to explainability. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, AIES 2019, Honolulu, HI, USA, January 2728, 2019, pages 255–261. ACM.
    Google ScholarLocate open access versionFindings
  • Jialin Wu and Raymond J. Mooney. 2018. Faithful multimodal explanation for visual question answering. CoRR, abs/1809.02805.
    Findings
  • Wenting Xiong, Iftitahu Ni’mah, Juan M. G. Huesca, Werner van Ipenburg, Jan Veldsink, and Mykola Pechenizkiy. 2018. Looking deeper into deep learning model: Attribution-based explanations of textcnn. CoRR, abs/1811.03970.
    Findings
  • Mo Yu, Shiyu Chang, Yang Zhang, and Tommi S. Jaakkola. 2019. Rethinking cooperative rationalization: Introspective extraction and complement control. CoRR, abs/1910.13294.
    Findings
  • Omar Zaidan and Jason Eisner. 2008. Modeling annotators: A generative approach to learning from annotator rationales. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 31–40, Honolulu, Hawaii. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments