Are Sixteen Heads Really Better than One?

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), pp. 14014-14024, 2019.

Cited by: 63|Bibtex|Views40
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com|arxiv.org
Keywords:

Abstract:

Attention is a powerful and ubiquitous mechanism for allowing neural models to focus on particular salient pieces of information by taking their weighted average when making predictions. In particular, multi-headedattention is a driving force behind many recent state-of-the-art natural language processing (NLP) models such as Transformer-...More

Code:

Data:

Full Text
Your rating :
0

 

Tags
Comments