Anchors: High-Precision Model-Agnostic Explanations

AAAI, 2018.

Cited by: 329|Bibtex|Views507
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com
Weibo:
We showed that anchors lead to higher human precision than linear explanations, and require less effort to understand and apply

Abstract:

We introduce a novel model-agnostic system that explains the behavior of complex models with high-precision rules called anchors, representing local, “sufficient” conditions for predictions. We propose an algorithm to efficiently compute these explanations for any black-box model with high-probability guarantees. We demonstrate the flexib...More

Code:

Data:

0
Introduction
  • Sophisticated machine learning models such as deep neural networks have been shown to be highly accurate for many applications, even though their complexity virtually makes them black-boxes.
  • Linear functions can capture relative importance of features in an easy-to-understand manner
  • Since these linear (a) Instances (b) LIME explanations (c) Anchor explanations explanations are in some way local, it is not clear whether they apply to an unseen instance.
  • In other words, their coverage is unclear.
  • When combined with the arithmetic involved in computing the contribution of the features in linear explanations, the human effort required can be quite high
Highlights
  • Sophisticated machine learning models such as deep neural networks have been shown to be highly accurate for many applications, even though their complexity virtually makes them black-boxes
  • As a consequence of the need for users to understand the behavior of these models, interpretable machine learning has seen a resurgence in recent years, ranging from the design of novel globally-interpretable machine learning models (Lakkaraju, Bach, and Leskovec 2016; Ustun and Rudin 2015; Wang and Rudin 2015) to local explanations that can be computed for any classifier (Baehrens et al 2010; Ribeiro, Singh, and Guestrin 2016b; Strumbelj and Kononenko 2010)
  • Unclear coverage can lead to low human precision, as users may think an insight from an explanation applies to unseen instances even when it does not
  • Visual Question Answering (VQA): As a final example, we present anchors for the Visual Question Answering task: answering a question asked of a reference image
  • We have argued that high precision and clear coverage are crucial for interpretable explanations of a model’s local behavior
  • We showed that anchors lead to higher human precision than linear explanations, and require less effort to understand and apply
Methods
  • No expls LIME(1) Anchor(1) LIME(2) Anchor(2) adult.
  • (a) adult dataset (b) rcdv dataset nations, most users would prefer a set of explanations that explain most of the model with as little effort on their part as possible - explanations picked using the submodular procedure described before.
  • In Figure 4, the authors show the coverage for gb in two of the datasets as the user sees more explanations, chosen either via submodular pick (SP-LIME and SP-Anchor) or at random (RP-LIME and RP-Anchor).
Conclusion
  • The authors have argued that high precision and clear coverage are crucial for interpretable explanations of a model’s local behavior.
  • The authors introduced a novel family of rule-based, model-agnostic explanations called anchors, designed to exhibit both these properties.
  • Anchors highlight the part of the input that is sufficient for the classifier to make the prediction, making them intuitive and easy to understand.
  • The authors demonstrated the flexibility of the anchor approach by explaining predictions from a variety of classifiers on multiple domains.
  • The authors showed that anchors lead to higher human precision than linear explanations, and require less effort to understand and apply
Summary
  • Introduction:

    Sophisticated machine learning models such as deep neural networks have been shown to be highly accurate for many applications, even though their complexity virtually makes them black-boxes.
  • Linear functions can capture relative importance of features in an easy-to-understand manner
  • Since these linear (a) Instances (b) LIME explanations (c) Anchor explanations explanations are in some way local, it is not clear whether they apply to an unseen instance.
  • In other words, their coverage is unclear.
  • When combined with the arithmetic involved in computing the contribution of the features in linear explanations, the human effort required can be quite high
  • Methods:

    No expls LIME(1) Anchor(1) LIME(2) Anchor(2) adult.
  • (a) adult dataset (b) rcdv dataset nations, most users would prefer a set of explanations that explain most of the model with as little effort on their part as possible - explanations picked using the submodular procedure described before.
  • In Figure 4, the authors show the coverage for gb in two of the datasets as the user sees more explanations, chosen either via submodular pick (SP-LIME and SP-Anchor) or at random (RP-LIME and RP-Anchor).
  • Conclusion:

    The authors have argued that high precision and clear coverage are crucial for interpretable explanations of a model’s local behavior.
  • The authors introduced a novel family of rule-based, model-agnostic explanations called anchors, designed to exhibit both these properties.
  • Anchors highlight the part of the input that is sufficient for the classifier to make the prediction, making them intuitive and easy to understand.
  • The authors demonstrated the flexibility of the anchor approach by explaining predictions from a variety of classifiers on multiple domains.
  • The authors showed that anchors lead to higher human precision than linear explanations, and require less effort to understand and apply
Tables
  • Table1: Anchors for Part-of-Speech tag for the word “play”
  • Table2: Anchors (in bold) of a machine translation system for the Portuguese word for “This” (in pink)
  • Table3: Generated anchors for Tabular datasets tuguese text. The first row in Table 2 means that when the words “This”, “is”, and “question” appear in English, the translation will include the word “Esta”. In Portuguese, the translation for the word “this” depends on the gender of the word it refers to (“esta” for feminine, “este” for masculine), or should be “isso” if its referent is not in the sentence. The anchors show that the model is capturing this behavior as it always includes “this is”, and the word that “this” refers to (“question” is feminine, “problem” is masculine)
  • Table4: Average precision and coverage with simulated users on 3 tabular datasets and 3 classifiers. lime-n indicates direct application of LIME to unseen instances, while lime-t indicates a threshold was tuned using an oracle to achieve the same precision as the anchor approach. The anchor approach is able to maintain very high precision, while a naive use of linear explanations leads to varying degrees of precision
  • Table5: Results of the User Study. Underline: significant w.r.t. anchors in the same dataset and same number of explanations. Results show that users consistently achieve high precision with anchors, as opposed to baselines, with less effort (time)
Download tables as Excel
Related work
  • Even in the few cases where having some understanding of a machine learning model’s behavior is not a requirement, it is certainly an advantage. Relying only on validation accuracy has many well studied problems, as practitioners consistently overestimate their model’s accuracy (Patel et al 2008), propagate feedback loops (Sculley et al 2015), or fail to notice data leaks (Kaufman, Rosset, and Perlich 2011).

    Compared to other interpretable options, rules fare well; users prefer, trust and understand rules better than alternatives (Lim, Dey, and Avrahami 2009; Stumpf et al 2007), in particular rules similar to anchors. Short, disjoint rules are easier to interpret than hierarchies like decision lists or trees (Lakkaraju, Bach, and Leskovec 2016). A number of approaches construct globally interpretable models, many based on rules (Lakkaraju, Bach, and Leskovec 2016; Letham et al 2015; Wang and Rudin 2015; Wang et al 2015). With such models, the user should be able to guess the model’s behavior on any example (i.e. perfect coverage). However, these models are not appropriate for many domains, e.g. almost no interpretable rule-based system is suitable for text or image applications, due to the sheer size of the feature space, or are just not accurate enough. Interpretability, in these cases, comes at the cost of flexibility, accuracy, or efficiency (Ribeiro, Singh, and Guestrin 2016a). An alternative is learning a simple (interpretable) model to imitate the black (a) Original image (b) Anchor for “beagle”
Funding
  • This work was supported in part by ONR award #N00014-13-1-0023, and in part by FICO and Adobe Research
  • The views expressed are those of the authors and do not reflect the policy or position of the funding agencies
Reference
  • Baehrens, D.; Schroeter, T.; Harmeling, S.; Kawanabe, M.; Hansen, K.; and Muller, K.-R. 2010. How to explain individual classification decisions. Journal of Machine Learning Research 11.
    Google ScholarLocate open access versionFindings
  • Cover, T. M., and Thomas, J. A. 1991. Elements of Information Theory. New York, NY, USA: Wiley-Interscience.
    Google ScholarFindings
  • Craven, M. W., and Shavlik, J. W. 1996. Extracting treestructured representations of trained networks. Advances in neural information processing systems 24–30.
    Google ScholarFindings
  • De Raedt, L., and Kersting, K. 2008. Probabilistic inductive logic programming. Berlin, Heidelberg: Springer-Verlag. chapter Probabilistic Inductive Logic Programming, 1–27.
    Google ScholarLocate open access versionFindings
  • Kaufman, S.; Rosset, S.; and Perlich, C. 2011. Leakage in data mining: Formulation, detection, and avoidance. In Knowledge Discovery and Data Mining (KDD).
    Google ScholarFindings
  • Kaufmann, E., and Kalyanakrishnan, S. 2013. Information complexity in bandit subset selection. In Proceedings of the Twenty-sixth nnual Conference on Learning Theory (COLT 2013), volume 30 of JMLR Workshop and Conference Proceedings, 228–251. JMLR.
    Google ScholarLocate open access versionFindings
  • Krause, A., and Golovin, D. 2014. Submodular function maximization. In Tractability: Practical Approaches to Hard Problems. Cambridge University Press.
    Google ScholarFindings
  • Lakkaraju, H.; Bach, S. H.; and Leskovec, J. 2016. Interpretable decision sets: A joint framework for description and prediction. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, 1675–1684. New York, NY, USA: ACM.
    Google ScholarLocate open access versionFindings
  • Letham, B.; Rudin, C.; McCormick, T. H.; and Madigan, D. 2015. Interpretable classifiers using rules and bayesian analysis: Building a better stroke prediction model. Annals of Applied Statistics.
    Google ScholarLocate open access versionFindings
  • Lim, B. Y.; Dey, A. K.; and Avrahami, D. 2009. Why and why not explanations improve the intelligibility of contextaware intelligent systems. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’09, 2119–2128. New York, NY, USA: ACM.
    Google ScholarLocate open access versionFindings
  • Patel, K.; Fogarty, J.; Landay, J. A.; and Harrison, B. 2008. Investigating statistical machine learning as a tool for software development. In Human Factors in Computing Systems (CHI).
    Google ScholarLocate open access versionFindings
  • Pennington, J.; Socher, R.; and Manning, C. D. 2014. Glove: Global vectors for word representation. In Empirical Methods in Natural Language Processing (EMNLP), 1532–1543.
    Google ScholarLocate open access versionFindings
  • Ren, M.; Kiros, R.; and Zemel, R. S. 2015. Exploring models and data for image question answering. In Proceedings of the 28th International Conference on Neural Information Processing Systems, NIPS’15, 2953–2961.
    Google ScholarLocate open access versionFindings
  • Ribeiro, M. T.; Singh, S.; and Guestrin, C. 2016a. Modelagnostic interpretability of machine learning. In Human Interpretability in Machine Learning workshop, ICML ’16.
    Google ScholarLocate open access versionFindings
  • Ribeiro, M. T.; Singh, S.; and Guestrin, C. 2016b. “why should I trust you?”: Explaining the predictions of any classifier. In Knowledge Discovery and Data Mining (KDD).
    Google ScholarFindings
  • Sanchez, I.; Rocktaschel, T.; Riedel, S.; and Singh, S. 2015. Towards extracting faithful and descriptive representations of latent variable models. In AAAI Spring Syposium on Knowledge Representation and Reasoning (KRR): Integrating Symbolic and Neural Approaches.
    Google ScholarLocate open access versionFindings
  • Schmidt, P., and Witte, A. D. 1988. Predicting Recidivism in North Carolina, 1978 and 1980. Inter-university Consortium for Political and Social Research.
    Google ScholarFindings
  • Sculley, D.; Holt, G.; Golovin, D.; Davydov, E.; Phillips, T.; Ebner, D.; Chaudhary, V.; Young, M.; and Crespo, J.-F. 2015. Hidden technical debt in machine learning systems. In Neural Information Processing Systems (NIPS).
    Google ScholarLocate open access versionFindings
  • Strumbelj, E., and Kononenko, I. 2010. An efficient explanation of individual classifications using game theory. Journal of Machine Learning Research 11.
    Google ScholarLocate open access versionFindings
  • Stumpf, S.; Rajaram, V.; Li, L.; Burnett, M.; Dietterich, T.; Sullivan, E.; Drummond, R.; and Herlocker, J. 2007. Toward harnessing user feedback for machine learning. In Proceedings of the 12th International Conference on Intelligent User Interfaces, IUI ’07, 82–91. New York, NY, USA: ACM.
    Google ScholarLocate open access versionFindings
  • Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; and Rabinovich, A. 2015. Going deeper with convolutions. In Computer Vision and Pattern Recognition (CVPR).
    Google ScholarLocate open access versionFindings
  • Tsoumakas, G., and Katakis, I. 2006. Multi-label classification: An overview. International Journal of Data Warehousing and Mining 3(3).
    Google ScholarLocate open access versionFindings
  • Ustun, B., and Rudin, C. 2015. Supersparse linear integer models for optimized medical scoring systems. Machine Learning.
    Google ScholarFindings
  • Vedaldi, A., and Soatto, S. 2008. Quick shift and kernel methods for mode seeking. In European Conference on Computer Vision, 705–718. Springer.
    Google ScholarLocate open access versionFindings
  • Vinyals, O.; Kaiser, L.; Koo, T.; Petrov, S.; Sutskever, I.; and Hinton, G. E. 2015. Grammar as a foreign language. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, 2773–2781.
    Google ScholarLocate open access versionFindings
  • Wang, F., and Rudin, C. 2015. Falling rule lists. In Artificial Intelligence and Statistics (AISTATS).
    Google ScholarLocate open access versionFindings
  • Wang, T.; Rudin, C.; Doshi-Velez, F.; Liu, Y.; Klampfl, E.; and MacNeille, P. 2015. Or’s of and’s for interpretable classification, with application to context-aware recommender systems. arXiv:1504.07614.
    Findings
  • Wieting, J.; Bansal, M.; Gimpel, K.; and Livescu, K. 2015. Towards universal paraphrastic sentence embeddings. CoRR abs/1511.08198.
    Findings
  • Zhu, Y.; Groth, O.; Bernstein, M.; and Fei-Fei, L. 2016. Visual7W: Grounded Question Answering in Images. In IEEE Conference on Computer Vision and Pattern Recognition.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments