AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We propose and report multiple metrics to empirically evaluate the performance of saliency methods for detecting feature importance over time using both precision and recall

Benchmarking Deep Learning Interpretability in Time Series Predictions

NIPS 2020, (2020)

Cited by: 0|Views59
EI
Full Text
Bibtex
Weibo

Abstract

Saliency methods are used extensively to highlight the importance of input features in model predictions. These methods are mostly used in vision and language tasks, and their applications to time series data is relatively unexplored. In this paper, we set out to extensively compare the performance of various saliency-based interpretabi...More
0
Introduction
  • As the use of Machine Learning models increases in various domains [1, 2], the need for reliable model explanations is crucial [3, 4]
  • This need has resulted in the development of numerous interpretability methods that estimate feature importance [5,6,7,8,9,10,11,12,13].
  • Adebayo et al [16] measures changes in the attribute when randomizing model parameters or labels
Highlights
  • As the use of Machine Learning models increases in various domains [1, 2], the need for reliable model explanations is crucial [3, 4]
  • Based on our extensive experiments, we report the following observations: (i) feature importance estimators that produce high-quality saliency maps in images often fail to provide similar high-quality interpretation in time series data, (ii) saliency methods tend to fail to distinguish important vs. nonimportant features in a given time step; if a feature in a given time is assigned to high saliency, almost all other features in that time step tend to have high saliency regardless of their actual values, (iii) model architectures have significant effects on the quality of saliency maps
  • Precision and Recall Looking at precision and recall distribution box plots Figure 7, we observe the following: (a) Model architecture has the largest effect on precision and recall. (b) Results do not show clear distinctions between saliency methods. (c) Methods can identify informative time steps while fail to identify informative features; AUPR in the time domain is higher than that in the feature domain. (d) Methods identify most features in an informative time step as salient, area under the recall curve (AUR) in feature domain is very high while having very low area under the precision curve (AUP)
  • We have studied deep learning interpretation methods when applied to multivariate time series data on various neural network architectures
  • That is, when temporal and feature domains are combined in a multivariate time series, saliency methods break down in general
  • We propose a two-step temporal saliency rescaling approach to adapt existing saliency methods to time series data
Methods
  • The authors compare popular backpropagation-based and perturbation based post-hoc saliency methods; each method provides feature importance, or relevance, at a given time step to each input feature.
  • (d) Methods identify most features in an informative time step as salient, AUR in feature domain is very high while having very low AUP.
  • A steep drop in model accuracy does not indicate that a saliency method is correctly identifying features used by the model since, in most cases, saliency methods with leftmost curves in Figure 6 have the lowest precision and recall values.
  • The maps for the bivariate and multivariate Grad are harder to interpret, applying the proposed temporal saliency rescaling approach on bivariate and multivariate time series significantly improves the quality of saliency maps and in some cases even better than images or univariate time series
Results
  • MNIST Figure 12 shows saliency maps produced by each pair on samples from time series MNIST; Figure 13, show the samples after applying TSR.
  • There is a significant improvement in the quality of the saliency map after applying the temporal saliency rescaling approach.
  • Synthetic Datasets Figure 14 shows saliency maps produced by each pair on samples from different synthetic datasets before and after applying TSR
Conclusion
  • Summary and Conclusion

    In this work, the authors have studied deep learning interpretation methods when applied to multivariate time series data on various neural network architectures.
  • The authors have found that commonly-used saliency methods, including both gradient-based, and perturbation-based methods, fail to produce high-quality interpretations when applied to multivariate time series data.
  • The authors observe that methods generally identify salient time steps but cannot distinguish important vs non-important features within a given time step
  • Building on this observation, the authors propose a two-step temporal saliency rescaling approach to adapt existing saliency methods to time series data.
  • This approach has led to substantial improvements in the quality of saliency maps produced by different methods
Tables
  • Table1: Results from TCN on Middle Box and Moving Box synthetic datasets. Higher AUPR, AUP, and AUR values indicate better performance. AUC lower values are better as this indicates that the rate of accuracy drop is higher
  • Table2: Confusion Matrix, for precision and recall calculation
  • Table3: Complexity analysis of different varaitions of TSR
Download tables as Excel
Funding
  • This project was supported in part by NSF CAREER AWARD 1942230, a grant from NIST 303457-00001, AWS Machine Learning Research Award and Simons Fellowship on “Foundations of Deep Learning.”
Study subjects and analysis
datasets: 70
Different dataset combinations are shown in Figure 1. Each synthetic dataset is generated by seven different processes as shown in Figure 2, giving a total of 70 datasets. Each feature is independently sampled from either: (a) Gaussian with zero mean and unit variance. (b) Independent sequences of a standard autoregressive time series with Gaussian noise. (c) A standard continuous autoregressive time series with Gaussian noise. (d) Sampled according to a Gaussian Process mixture model. (e) Nonuniformly sampled from a harmonic function. (f) Sequences of standard non–linear autoregressive moving average (NARMA) time series with Gaussian noise. (g) Nonuniformly sampled from a pseudo period function with Gaussian noise

Reference
  • Michael L Rich. Machine learning, automated suspicion algorithms, and the fourth amendment. In University of Pennsylvania Law Review, 2016.
    Google ScholarLocate open access versionFindings
  • Ziad Obermeyer and Ezekiel J Emanuel. Predicting the future—big data, machine learning, and clinical medicine. In The New England journal of medicine, 2016.
    Google ScholarLocate open access versionFindings
  • Rich Caruana, Yin Lou, Johannes Gehrke, Paul Koch, Marc Sturm, and Noemie Elhadad. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In International conference on knowledge discovery and data mining, 2015.
    Google ScholarLocate open access versionFindings
  • Zachary C Lipton. The mythos of model interpretability. In Queue, 2018.
    Google ScholarLocate open access versionFindings
  • David Baehrens, Timon Schroeter, Stefan Harmeling, Motoaki Kawanabe, Katja Hansen, and Klaus-Robert MÞller. How to explain individual classification decisions. In Journal of Machine Learning Research, 2010.
    Google ScholarLocate open access versionFindings
  • Sebastian Bach, Alexander Binder, Grégoire Montavon, Frederick Klauschen, Klaus-Robert Müller, and Wojciech Samek. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. In PLoS ONE, 2015.
    Google ScholarLocate open access versionFindings
  • Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. CoRR, 2013.
    Google ScholarLocate open access versionFindings
  • Pieter-Jan Kindermans, Kristof Schütt, Klaus-Robert Müller, and Sven Dähne. Investigating the influence of noise and distractors on the interpretation of neural networks. arXiv preprint arXiv:1611.07270, 2016.
    Findings
  • Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International Conference on Machine Learning, 2017.
    Google ScholarLocate open access versionFindings
  • Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017.
    Findings
  • Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. Learning important features through propagating activation differences. In International Conference on Machine Learning, 2017.
    Google ScholarLocate open access versionFindings
  • Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems, 2017.
    Google ScholarLocate open access versionFindings
  • Alexander Levine, Sahil Singla, and Soheil Feizi. Certifiably robust interpretation in deep learning. arXiv preprint arXiv:1905.12105, 2019.
    Findings
  • Sara Hooker, Dumitru Erhan, Pieter-Jan Kindermans, and Been Kim. A benchmark for interpretability methods in deep neural networks. In Advances in Neural Information Processing Systems, 2019.
    Google ScholarLocate open access versionFindings
  • Amirata Ghorbani, Abubakar Abid, and James Zou. Interpretation of neural networks is fragile. In AAAI Conference on Artificial Intelligence, 2019.
    Google ScholarLocate open access versionFindings
  • Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Goodfellow, Moritz Hardt, and Been Kim. Sanity checks for saliency maps. In Advances in Neural Information Processing Systems, 2018.
    Google ScholarLocate open access versionFindings
  • Pieter-Jan Kindermans, Sara Hooker, Julius Adebayo, Maximilian Alber, Kristof T Schütt, Sven Dähne, Dumitru Erhan, and Been Kim. The (un) reliability of saliency methods. In Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, 2019.
    Google ScholarLocate open access versionFindings
  • Sahil Singla, Eric Wallace, Shi Feng, and Soheil Feizi. Understanding impacts of high-order loss approximations and features in deep learning interpretation. In International Conference on Machine Learning, 2019.
    Google ScholarLocate open access versionFindings
  • Jimmy Ba and Rich Caruana. Do deep nets really need to be deep? In Advances in Neural Information Processing Systems, 2014.
    Google ScholarLocate open access versionFindings
  • Nicholas Frosst and Geoffrey Hinton. Distilling a neural network into a soft decision tree. arXiv preprint arXiv:1711.09784, 2017.
    Findings
  • Andrew Slavin Ross, Michael C Hughes, and Finale Doshi-Velez. Right for the right reasons: Training differentiable models by constraining their explanations. arXiv preprint arXiv:1703.03717, 2017.
    Findings
  • Mike Wu, Michael C Hughes, Sonali Parbhoo, Maurizio Zazzi, Volker Roth, and Finale DoshiVelez. Beyond sparsity: Tree regularization of deep models for interpretability. In AAAI Conference on Artificial Intelligence, 2018.
    Google ScholarLocate open access versionFindings
  • Aya Abdelsalam Ismail, Mohamed Gunady, Luiz Pessoa, Hector Corrada Bravo, and Soheil Feizi. Input-cell attention reduces vanishing saliency of recurrent neural networks. In Advances in Neural Information Processing Systems, 2019.
    Google ScholarLocate open access versionFindings
  • Matthew D Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. In European conference on computer vision, 2014.
    Google ScholarLocate open access versionFindings
  • Marco Ancona, Enea Ceolini, Cengiz Öztireli, and Markus Gross. Towards better understanding of gradient-based attribution methods for deep neural networks. In International Conference on Learning Representations, 2018.
    Google ScholarLocate open access versionFindings
  • Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, and Rory Sayres. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International Conference on Machine Learning, 2018.
    Google ScholarLocate open access versionFindings
  • Wojciech Samek, Alexander Binder, Grégoire Montavon, Sebastian Lapuschkin, and KlausRobert Müller. Evaluating the visualization of what a deep neural network has learned. IEEE transactions on neural networks and learning systems, 2016.
    Google ScholarLocate open access versionFindings
  • Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018.
    Findings
  • Pieter-Jan Kindermans, Kristof T Schütt, Maximilian Alber, Klaus-Robert Müller, Dumitru Erhan, Been Kim, and Sven Dähne. Learning how to explain neural networks: Patternnet and patternattribution. arXiv preprint arXiv:1705.05598, 2017.
    Findings
  • Sana Tonekaboni, Shalmali Joshi, David Duvenaud, and Anna Goldenberg. What went wrong and when? instance-wise feature importance for time-series models. arXiv preprint arXiv:2003.02821, 2020.
    Findings
  • Michaela Hardt, Alvin Rajkomar, Gerardo Flores, Andrew Dai, Michael Howell, Greg Corrado, Claire Cui, and Moritz Hardt. Explaining an increase in predicted risk for clinical alerts. In ACM Conference on Health, Inference, and Learning, 2020.
    Google ScholarLocate open access versionFindings
  • Harini Suresh, Nathan Hunt, Alistair Johnson, Leo Anthony Celi, Peter Szolovits, and Marzyeh Ghassemi. Clinical intervention prediction and understanding using deep networks. arXiv preprint arXiv:1705.08498, 2017.
    Findings
  • Christoph Molnar. Interpretable Machine Learning. Lulu. com, 2020.
    Google ScholarLocate open access versionFindings
  • Javier Castro, Daniel Gómez, and Juan Tejada. Polynomial calculation of the shapley value based on sampling. Computers & Operations Research, 36(5):1726–1730, 2009.
    Google ScholarLocate open access versionFindings
  • Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. In Neural computation, 1997.
    Google ScholarLocate open access versionFindings
  • Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499, 2016.
    Findings
  • Colin Lea, Michael Flynn, Rene Vidal, Austin Reiter, and Gregory Hager. Temporal convolutional networks for action segmentation and detection. In Conference on Computer Vision and Pattern Recognition, 2017.
    Google ScholarLocate open access versionFindings
  • Shaojie Bai, J Zico Kolter, and Vladlen Koltun. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271, 2018.
    Findings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems, 2017.
    Google ScholarLocate open access versionFindings
  • Zachary C Lipton. The doctor just won’t accept that! arXiv preprint arXiv:1711.08037, 2017.
    Findings
  • Sana Tonekaboni, Shalmali Joshi, Melissa D McCradden, and Anna Goldenberg. What clinicians want: contextualizing explainable machine learning for clinical end use. arXiv preprint arXiv:1905.05134, 2019.
    Findings
  • Carl Edward Rasmussen. Gaussian processes in machine learning. In Summer School on Machine Learning, 2003.
    Google ScholarLocate open access versionFindings
  • David C Van Essen, Stephen M Smith, Deanna M Barch, Timothy EJ Behrens, Essa Yacoub, Kamil Ugurbil, Wu-Minn HCP Consortium, et al. The wu-minn human connectome project: an overview. In Neuroimage, 2013.
    Google ScholarFindings
Author
Aya Abdelsalam Ismail
Aya Abdelsalam Ismail
Mohamed Gunady
Mohamed Gunady
Soheil Feizi
Soheil Feizi
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科