Gradient-based Analysis of NLP Models is Manipulable

Junlin Wang
Junlin Wang
Jens Tuyls
Jens Tuyls

EMNLP, pp. 247-258, 2020.

EI
Other Links: arxiv.org|dblp.uni-trier.de|academic.microsoft.com

Abstract:

Gradient-based analysis methods, such as saliency map visualizations and adversarial input perturbations, have found widespread use in interpreting neural NLP models due to their simplicity, flexibility, and most importantly, their faithfulness. In this paper, however, we demonstrate that the gradients of a model are easily manipulable,...More

Code:

Data:

Full Text
Your rating :
0

 

Tags
Comments