Learning Variational Word Masks to Improve the Interpretability of Neural Text Classifiers

EMNLP 2020, 2020.

Cited by: 0|Bibtex|Views23
Other Links: arxiv.org
Keywords:
information flow by adding noiseinterpretable neuraltext classificationarea over the perturbation curveprediction accuracyMore(12+)
Weibo:
We propose the variational word mask method to automatically learn task-specific important words and reduce irrelevant information on classification, which improves the interpretability of model predictions

Abstract:

To build an interpretable neural text classifier, most of the prior work has focused on designing inherently interpretable models or finding faithful explanations. A new line of work on improving model interpretability has just started, and many existing methods require either prior information or human annotations as additional inputs ...More

Code:

Data:

0
Introduction
  • Neural network models have achieved remarkable performance on text classification due to their capacity of representation learning on natural language texts (Zhang et al, 2015; Yang et al, 2016; Joulin et al, 2017; Devlin et al, 2018).
  • Ex. Model Text & Explanation different levels of interpretability, even though they may have similar prediction performance.
  • Table 1 shows explanations extracted from two neural text classifiers with similar network architectures.1.
  • Both models make correct predictions of the sentiment polarities of two input texts, they have different explanations for their predictions.
  • Unlike prior work on improving model interpretability (Erion et al, 2019; Plumb et al, 2019), it does not require pre-defined important attributions or pre-collected explanations
Highlights
  • Neural network models have achieved remarkable performance on text classification due to their capacity of representation learning on natural language texts (Zhang et al, 2015; Yang et al, 2016; Joulin et al, 2017; Devlin et al, 2018)
  • The contribution of this work is three-fold: (1) we proposed the variational word mask (VMASK) method to learn global task-specific important features that can improve both model interpretability and prediction accuracy; (2) we formulated the problem in the framework of information bottleneck (IB) (Tishby et al, 2000; Tishby and Zaslavsky, 2015) and derived a lower bound of the objective function via the variational IB method (Alemi et al, 2016); and (3) we evaluated the proposed method with three neural network models, CNN (Kim, 2014), LSTM (Hochreiter and Schmidhuber, 1997), and BERT (Devlin et al, 2018), on seven text classification tasks via both quantitative and qualitative evaluations
  • The models trained with VMASK outperform the ones with similar network architec
  • We evaluate the local interpretability of VMASKbased models against the base models via the area over the perturbation curve (AOPC) score (Nguyen, 2018; Samek et al, 2016) and the global interpretability against the IBAbased models via post-hoc accuracy (Chen et al, 2018)
  • VMASK-based models outperform information flow by adding noise (IBA)-based models with higher post-hoc accuracy, which indicates our proposed method is better on capturing task-specific important features
  • We proposed an effective method, VMASK, learning global task-specific important features to improve both model interpretability and prediction accuracy
Methods
  • BERT-base BERT-VMASK the post-hoc accuracy (Chen et al, 2018) to evaluate the influence of global task-specific important features on the predictions of VMASK- and IBAbased models.
  • VMASK-based models outperform IBA-based models with higher post-hoc accuracy, which indicates the proposed method is better on capturing task-specific important features.
  • It is probably because BERT tends to use larger context with its self-attentions for predictions
  • This explains that the post-hoc accuracies of BERT-
Results
  • The authors trained the three models on the seven datasets with different training strategies.
  • Table 3 shows the prediction accuracy of different models on test sets.
  • The validation performance and average runtime are in Appendix D.
  • As shown in Table 3, all base models have the similar prediction performance comparing to numbers reported in prior work (Appendix E).
  • The models trained with VMASK outperform the ones with similar network architec-
Conclusion
  • Task-specific important words.
  • Figure 3 visualizes top 10 important words for the VMASKand IBA-based models on three datasets via word clouds.
  • The authors proposed an effective method, VMASK, learning global task-specific important features to improve both model interpretability and prediction accuracy.
  • The authors tested VMASK with three different neural text classifiers on seven benchmark datasets, and assessed its effectiveness via both quantitative and qualitative evaluations.
  • CNN-base Primary plot , primary direction , poor interpretation.
  • CNN-VMASK Primary plot , primary direction , poor interpretation
Summary
  • Introduction:

    Neural network models have achieved remarkable performance on text classification due to their capacity of representation learning on natural language texts (Zhang et al, 2015; Yang et al, 2016; Joulin et al, 2017; Devlin et al, 2018).
  • Ex. Model Text & Explanation different levels of interpretability, even though they may have similar prediction performance.
  • Table 1 shows explanations extracted from two neural text classifiers with similar network architectures.1.
  • Both models make correct predictions of the sentiment polarities of two input texts, they have different explanations for their predictions.
  • Unlike prior work on improving model interpretability (Erion et al, 2019; Plumb et al, 2019), it does not require pre-defined important attributions or pre-collected explanations
  • Objectives:

    As the goal of this work is to propose a novel training method that improves both prediction accuracy and interpretability, the authors employ two groups of models as baselines and competitive systems.
  • Methods:

    BERT-base BERT-VMASK the post-hoc accuracy (Chen et al, 2018) to evaluate the influence of global task-specific important features on the predictions of VMASK- and IBAbased models.
  • VMASK-based models outperform IBA-based models with higher post-hoc accuracy, which indicates the proposed method is better on capturing task-specific important features.
  • It is probably because BERT tends to use larger context with its self-attentions for predictions
  • This explains that the post-hoc accuracies of BERT-
  • Results:

    The authors trained the three models on the seven datasets with different training strategies.
  • Table 3 shows the prediction accuracy of different models on test sets.
  • The validation performance and average runtime are in Appendix D.
  • As shown in Table 3, all base models have the similar prediction performance comparing to numbers reported in prior work (Appendix E).
  • The models trained with VMASK outperform the ones with similar network architec-
  • Conclusion:

    Task-specific important words.
  • Figure 3 visualizes top 10 important words for the VMASKand IBA-based models on three datasets via word clouds.
  • The authors proposed an effective method, VMASK, learning global task-specific important features to improve both model interpretability and prediction accuracy.
  • The authors tested VMASK with three different neural text classifiers on seven benchmark datasets, and assessed its effectiveness via both quantitative and qualitative evaluations.
  • CNN-base Primary plot , primary direction , poor interpretation.
  • CNN-VMASK Primary plot , primary direction , poor interpretation
Tables
  • Table1: Model A and B are two neural text classifiers with similar network architectures. They all make correct sentiment predictions on both texts (ex. 1: positive; ex. 2: negative). Two post-hoc explanation methods, LIME (Ribeiro et al, 2016) and SampleShapley (Kononenko et al, 2010), are used to explain the model predictions on example 1 and 2 respectively. Top three important words are shown in pink or blue for model A and B. Whichever post-hoc method is used, explanations from model B are easier to understand because the sentiment keywords “clever” and “gimmicky” are highlighted
  • Table2: Summary statistics for the datasets, where C is the number of classes, L is average sentence length, and # counts the number of examples in the train/dev/test sets
  • Table3: Prediction accuracy (%) of different models with different training strategies on the seven datasets
  • Table4: AOPCs (%) of LIME and SampleShapley in interpreting the base and VMASK-based models on the seven datasets
  • Table5: Examples of the explanations generated by LIME for different models on the IMDB dataset, where the top three important words are highlighted. The color saturation indicates word attribution
  • Table6: Post-hoc global important words selected by SP-LIME for different models on the IMDB dataset
  • Table7: Pre-processing details on the datasets. vocab: vocab size; threshold: low-frequency threshold; length: mini-batch sentence length
  • Table8: Validation accuracy (%) for each reported test accuracy
  • Table9: Average runtime (s/epoch) for each approach on each dataset
  • Table10: Results of prediction accuracy (%) collected from previous papers
  • Table11: Pearson correlation coefficients of VMASKbased models on the seven datasets
Download tables as Excel
Related work
  • Various approaches have been proposed to interpret DNNs, ranging from designing inherently interpretable models (Melis and Jaakkola, 2018; Rudin, 2019), to tracking the inner-workings of neural networks (Jacovi et al, 2018; Murdoch et al, 2018), to generating post-hoc explanations (Ribeiro et al, 2016; Lundberg and Lee, 2017). Beyond interpreting model predictions, the explanation generation methods are also promising in improving model’s performance. We propose an information-theoretic method to improve both prediction accuracy and interpretability.

    Explanation from the information-theoretic perspective. A line of works that motivate ours leverage information theory to produce explanations, either maximizing mutual information to recognize important features (Chen et al, 2018; Guan et al, 2019), or optimizing the information bottleneck to identify feature attributions (Schulz et al, 2020; Bang et al, 2019). The information-theoretic approaches are efficient and flexible in identifying important features. Different from generating post-hoc explanations for well-trained models, we utilize information bottleneck to train a more interpretable model with better prediction performance.
Study subjects and analysis
datasets with k ranging from 1 to 10: 7
Models

IMDB SST-1 SST-2 Yelp AG News TREC Subj LIME SampleShapley

CNN-base CNN-VMASK LSTM-base LSTM-VMASK

BERT-base BERT-VMASK the post-hoc accuracy (Chen et al, 2018) to evaluate the influence of global task-specific important features on the predictions of VMASK- and IBAbased models. For each test data, we select the top k words based on their global importance scores for the model to make a prediction, and compare it with the original prediction made on the whole input text

1 post-hoc-acc(k) = M M

1[ym(k) = ym], m=1 where M is the number of examples, ym is the predicted label on the m-th test data, and ym(k) is the predicted label based on the top k important words.

Figure 1 shows the results of VMASK- and IBAbased models on the seven datasets with k ranging from 1 to 10
. VMASK-based models (solid lines) outperform IBA-based models (dotted lines) with higher post-hoc accuracy, which indicates our proposed method is better on capturing task-specific important features

datasets with k ranging from 1 to 10: 7
1[ym(k) = ym], m=1 where M is the number of examples, ym is the predicted label on the m-th test data, and ym(k) is the predicted label based on the top k important words. Figure 1 shows the results of VMASK- and IBAbased models on the seven datasets with k ranging from 1 to 10. VMASK-based models (solid lines) outperform IBA-based models (dotted lines) with higher post-hoc accuracy, which indicates our proposed method is better on capturing task-specific important features

benchmark datasets: 7
Datasets. We adopt seven benchmark datasets: movie reviews IMDB (Maas et al, 2011), Stanford Sentiment Treebank with fine-grained labels SST-1 and its binary version SST-2 (Socher et al, 2013), Yelp reviews (Zhang et al, 2015), AG’s News (Zhang et al, 2015), 6-class question classification TREC (Li and Roth, 2002), and subjective/objective classification Subj (Pang and Lee, 2005). For the datasets (e.g. IMDB, Subj) without standard train/dev/test split, we hold out a proportion of training examples as the development set

datasets: 7
For each other dataset, we randomly pick up 1000 examples for evaluation due to computation costs. Table 4 shows the AOPCs of different models on the seven datasets by deleting top 5 words identified by LIME or SampleShapley. The AOPCs of VMASK-based models are significantly higher than that of base models on most of the datasets, indicating that VMASK can improve model’s interpretability to post-hoc explanations

datasets: 3
Task-specific important words. Figure 3 visualizes top 10 important words for the VMASKand IBA-based models on three datasets via word clouds. We can see that the selected words by VMASK are consistent with the corresponding topic, such as “funnest”, “awsome” for sentiment analysis, and “encyclopedia”, “spaceport” for news

datasets: 7
Summary statistics for the datasets, where C is the number of classes, L is average sentence length, and # counts the number of examples in the train/dev/test sets. Prediction accuracy (%) of different models with different training strategies on the seven datasets. AOPCs (%) of LIME and SampleShapley in interpreting the base and VMASK-based models on the seven datasets

datasets: 7
Prediction accuracy (%) of different models with different training strategies on the seven datasets. AOPCs (%) of LIME and SampleShapley in interpreting the base and VMASK-based models on the seven datasets. Examples of the explanations generated by LIME for different models on the IMDB dataset, where the top three important words are highlighted. The color saturation indicates word attribution

datasets: 7
Results of prediction accuracy (%) collected from previous papers. Pearson correlation coefficients of VMASKbased models on the seven datasets. Post-hoc accuracy of VMASK- and IBA-based models on the seven datasets

datasets: 7
Pearson correlation coefficients of VMASKbased models on the seven datasets. Post-hoc accuracy of VMASK- and IBA-based models on the seven datasets. Scatter plot of word global importance and frequency (in log scale) of LSTM-VMASK on the Yelp dataset, where red dots represent top 10 important sentiment words and green dots represent top 10 highfrequency words

Reference
  • Alexander A Alemi, Ian Fischer, Joshua V Dillon, and Kevin Murphy. 2016. Deep variational information bottleneck. arXiv preprint arXiv:1612.00410.
    Findings
  • Seojin Bang, Pengtao Xie, Heewook Lee, Wei Wu, and Eric Xing. 2019. Explaining a black-box using deep variational information bottleneck approach. arXiv preprint arXiv:1902.06918.
    Findings
  • Joost Bastings, Wilker Aziz, and Ivan Titov. 2019. Interpretable neural predictions with differentiable binary variables. arXiv preprint arXiv:1905.08160.
    Findings
Full Text
Your rating :
0

 

Tags
Comments