A Mixture of Heads is Better than Heads

ACL 2020, 2020.

Cited by: 2|Views69

Abstract:

Multi-head attentive neural architectures have achieved state-of-the-art results on a variety of natural language processing tasks. Evidence has shown that they are overparameterized; attention heads can be pruned without significant performance loss. In this work, we instead" reallocate" them--the model learns to activate different heads...More

Code:

Data:

Full Text
Bibtex
Your rating :
0

 

Tags
Comments