# Learning to Explain: An Information-Theoretic Perspective on Model Interpretation

ICML, pp. 882-891, 2018.

EI

Weibo:

Abstract:

We introduce instancewise feature selection as a methodology for model interpretation. Our method is based on learning a function to extract a subset of features that are most informative for each given example. This feature selector is trained to maximize the mutual information between selected features and the response variable, where t...More

Code:

Data:

Introduction

- Interpretability is an extremely important criterion when a machine learning model is applied in areas such as medicine, financial markets, and criminal justice (e.g., see the discussion paper by Lipton ((Lipton, 2016)), as well as references therein).
- Many complex models, such as random forests, kernel methods, and deep neural networks, have been developed and employed to optimize prediction accuracy, which can compromise their ease of interpretation.
- While feature selection produces a global importance of features with respect to the entire labeled data set, instancewise feature selection measures feature importance locally for each instance labeled by the model

Highlights

- Interpretability is an extremely important criterion when a machine learning model is applied in areas such as medicine, financial markets, and criminal justice (e.g., see the discussion paper by Lipton ((Lipton, 2016)), as well as references therein)
- Given a machine learning model, instancewise feature selection asks for the importance score of each feature on the prediction of a given instance, and the relative importance of each feature is allowed to vary across instances
- A related concept in machine learning is feature selection, which selects a subset of features that are useful to build a good predictor for a specified response variable (Guyon & Elisseeff, 2003)
- While feature selection produces a global importance of features with respect to the entire labeled data set, instancewise feature selection measures feature importance locally for each instance labeled by the model
- We have proposed a framework for instancewise feature selection via mutual information, and a method L2X which seeks a variational approximation of the mutual information, and makes use of a Gumbel-softmax relaxation of discrete subset sampling during training
- We have shown the efficiency and the capacity of L2X for instancewise feature selection on both synthetic and real data sets

Methods

- The authors carry out experiments on both synthetic and real data sets. For all experiments, the authors use RMSprop (Maddison et al, 2016) with the default hyperparameters for optimization.
- Codes for reproducing the key results are available online at https://github.com/Jianbo-Lab/ L2X.

Results

- The CNN model achieves 90% accuracy on the test data, close to the state-of-the-art performance.

Conclusion

- The authors have proposed a framework for instancewise feature selection via mutual information, and a method L2X which seeks a variational approximation of the mutual information, and makes use of a Gumbel-softmax relaxation of discrete subset sampling during training.
- L2X is the first method to realize real-time interpretation of a black-box model.
- The authors have shown the efficiency and the capacity of L2X for instancewise feature selection on both synthetic and real data sets

Summary

## Introduction:

Interpretability is an extremely important criterion when a machine learning model is applied in areas such as medicine, financial markets, and criminal justice (e.g., see the discussion paper by Lipton ((Lipton, 2016)), as well as references therein).- Many complex models, such as random forests, kernel methods, and deep neural networks, have been developed and employed to optimize prediction accuracy, which can compromise their ease of interpretation.
- While feature selection produces a global importance of features with respect to the entire labeled data set, instancewise feature selection measures feature importance locally for each instance labeled by the model
## Objectives:

The authors aim to maximize the mutual information between the response variable from the model and the selected features, as a function of the choice of selection rule.## Methods:

The authors carry out experiments on both synthetic and real data sets. For all experiments, the authors use RMSprop (Maddison et al, 2016) with the default hyperparameters for optimization.- Codes for reproducing the key results are available online at https://github.com/Jianbo-Lab/ L2X.
## Results:

The CNN model achieves 90% accuracy on the test data, close to the state-of-the-art performance.## Conclusion:

The authors have proposed a framework for instancewise feature selection via mutual information, and a method L2X which seeks a variational approximation of the mutual information, and makes use of a Gumbel-softmax relaxation of discrete subset sampling during training.- L2X is the first method to realize real-time interpretation of a black-box model.
- The authors have shown the efficiency and the capacity of L2X for instancewise feature selection on both synthetic and real data sets

- Table1: Summary of the properties of different methods. “Training” indicates whether a method requires training on an unlabeled data set. “Efficiency” qualitatively evaluates the computational time during single interpretation. “Additive” indicates whether a method is locally additive. “Model-agnostic” indicates whether a method is generic to black-box models
- Table2: True labels and labels predicted by the model are in the first two columns. Key words picked by L2X are highlighted in yellow
- Table3: True labels and labels from the model are shown in the first two columns. Key sentences picked by L2X highlighted in yellow
- Table4: Post-hoc accuracy and human accuracy of L2X on three models: a word-based CNN model on IMDB, a hierarchical LSTM model on IMDB, and a CNN model on MNIST

Funding

- L.S. was also supported in part by NSF IIS-1218749, NIH BIGDATA 1R01GM108341, NSF CAREER IIS-1350983, NSF IIS-1639792 EAGER, NSF CNS-1704701, ONR N00014-15-1-2340, Intel ISTC, NVIDIA and Amazon AWS

Study subjects and analysis

synthetic data sets: 4

Taylor Saliency DeepLIFT SHAP. We begin with experiments on four synthetic data sets:. • 2-dimensional XOR as binary classification

data sets: 3

Otherwise, the 6 − 9th dimensions are used to generate Y from the nonlinear additive model. The first three data sets are modified from commonly used data sets in the feature selection literature (Chen et al, 2017). The fourth data set is designed specifically for instancewise feature selection

data sets: 4

We also report the clock time of each method in Figure 2, where all experiments were performed on a single NVidia Tesla k80 GPU, coded in TensorFlow. Across all the four data sets, SHAP and LIME are the least efficient as they require multiple evaluations of the model. DeepLIFT, Taylor and Saliency requires a backward pass of the model

samples: 10000

The graphical model of obtaining XS from X. The clock time (in log scale) of explaining 10, 000 samples for each method. The training time of L2X is shown in translucent bars. 4.1. Synthetic Data. The box plots for the median ranks of the influential features by each sample, over 10, 000 samples for each data set. The red line and the dotted blue line on each box is the median and the mean respectively. Lower median ranks are better. The dotted green lines indicate the optimal median rank

samples: 10000

The clock time (in log scale) of explaining 10, 000 samples for each method. The training time of L2X is shown in translucent bars. 4.1. Synthetic Data. The box plots for the median ranks of the influential features by each sample, over 10, 000 samples for each data set. The red line and the dotted blue line on each box is the median and the mean respectively. Lower median ranks are better. The dotted green lines indicate the optimal median rank. The above figure shows ten randomly selected figures of 3 and 8 in the validation set. The first line include the original digits while the second line does not. The selected patches are colored with red if the pixel is activated (white) and blue otherwise

Reference

- LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. Gradientbased learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
- Ba, J., Mnih, V., and Kavukcuoglu, K. Multiple object recognition with visual attention. arXiv preprint arXiv:1412.7755, 2014.
- Bach, S., Binder, A., Montavon, G., Klauschen, F., Muller, K.-R., and Samek, W. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS one, 10(7):e0130140, 2015.
- Baehrens, D., Schroeter, T., Harmeling, S., Kawanabe, M., Hansen, K., and MAzller, K.-R. How to explain individual classification decisions. Journal of Machine Learning Research, 11(Jun):1803–1831, 2010.
- Bahdanau, D., Cho, K., and Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv e-prints, abs/1409.0473, September 2014.
- Chen, J., Stern, M., Wainwright, M. J., and Jordan, M. I. Kernel feature selection via conditional covariance minimization. In Advances in Neural Information Processing Systems 30, pp. 6949–6958. 2017.
- Chollet, F. et al. Keras. https://github.com/keras-team/keras, 2015.
- Cover, T. M. and Thomas, J. A. Elements of information theory. John Wiley & Sons, 2012.
- Gao, S., Ver Steeg, G., and Galstyan, A. Variational information maximization for feature selection. In Advances in Neural Information Processing Systems, pp. 487–495, 2016.
- Guyon, I. and Elisseeff, A. An introduction to variable and feature selection. Journal of machine learning research, 3(Mar):1157–1182, 2003.
- Hochreiter, S. and Schmidhuber, J. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
- Jang, E., Gu, S., and Poole, B. Categorical reparameterization with gumbel-softmax. stat, 1050:1, 2017.
- Kim, Y. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882, 2014.
- Li, J., Luong, M.-T., and Jurafsky, D. A hierarchical neural autoencoder for paragraphs and documents. arXiv preprint arXiv:1506.01057, 2015.
- Lipton, Z. C. The mythos of model interpretability. arXiv preprint arXiv:1606.03490, 2016.
- Lundberg, S. M. and Lee, S.-I. A unified approach to interpreting model predictions. pp. 4768–4777, 2017.
- Maas, A. L., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., and Potts, C. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pp. 142–150. Association for Computational Linguistics, 2011.
- Maddison, C. J., Tarlow, D., and Minka, T. A* sampling. In Advances in Neural Information Processing Systems, pp. 3086–3094, 2014.
- Maddison, C. J., Mnih, A., and Teh, Y. W. The concrete distribution: A continuous relaxation of discrete random variables. arXiv preprint arXiv:1611.00712, 2016.
- Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pp. 3111–3119, 2013.
- Peng, H., Long, F., and Ding, C. Feature selection based on mutual information criteria of max-dependency, maxrelevance, and min-redundancy. IEEE Transactions on pattern analysis and machine intelligence, 27(8):1226– 1238, 2005.
- Raffel, C., Luong, T., Liu, P. J., Weiss, R. J., and Eck, D. Online and linear-time attention by enforcing monotonic alignments. arXiv preprint arXiv:1704.00784, 2017.
- Ribeiro, M. T., Singh, S., and Guestrin, C. Why should i trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144. ACM, 2016.
- Shrikumar, A., Greenside, P., and Kundaje, A. Learning important features through propagating activation differences. In ICML, volume 70 of Proceedings of Machine
- Learning Research, pp. 3145–3153. PMLR, 06–11 Aug 2017.
- Simonyan, K., Vedaldi, A., and Zisserman, A. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013.
- Springenberg, J. T., Dosovitskiy, A., Brox, T., and Riedmiller, M. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014.
- Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1):1929–1958, 2014.
- Sundararajan, M., Taly, A., and Yan, Q. Axiomatic attribution for deep networks. In International Conference on Machine Learning, pp. 3319–3328, 2017.
- Williams, R. J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229–256, 1992.
- Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., and Bengio, Y. Show, attend and tell: Neural image caption generation with visual attention. In International Conference on Machine Learning, pp. 2048–2057, 2015.
- Yang, P., Chen, J., Hsieh, C.-J., Wang, J.-L., and Jordan, M. I. Greedy attack and gumbel attack: Generating adversarial examples for discrete data. arXiv preprint arXiv:1805.12316, 2018.
- Zhang, Y. and Wallace, B. A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. arXiv preprint arXiv:1510.03820, 2015.

Full Text

Tags

Comments