Learning to Deceive with Attention-Based Explanations

ACL, pp. 4782-4793, 2020.

Cited by: 35|Views89
EI
Weibo:
Amidst practices that perceive attention scores to be an indication of what the model focuses on, we show that attention scores are manipulable

Abstract:

Attention mechanisms are ubiquitous components in neural architectures applied in natural language processing. In addition to yielding gains in predictive accuracy, researchers often claim that attention weights confer interpretability, purportedly useful both for providing insights to practitioners and for explaining why a model makes ...More
0
Your rating :
0

 

Tags
Comments