Learning to Deceive with Attention-Based Explanations
ACL, pp. 4782-4793, 2020.
Amidst practices that perceive attention scores to be an indication of what the model focuses on, we show that attention scores are manipulable
Attention mechanisms are ubiquitous components in neural architectures applied in natural language processing. In addition to yielding gains in predictive accuracy, researchers often claim that attention weights confer interpretability, purportedly useful both for providing insights to practitioners and for explaining why a model makes ...More
PPT (Upload PPT)