Gradient-based Analysis of NLP Models is Manipulable
EMNLP, pp. 247-258, 2020.
EI
Abstract:
Gradient-based analysis methods, such as saliency map visualizations and adversarial input perturbations, have found widespread use in interpreting neural NLP models due to their simplicity, flexibility, and most importantly, their faithfulness. In this paper, however, we demonstrate that the gradients of a model are easily manipulable,...More
Code:
Data:
Full Text
Tags
Comments