Cluster-Former: Clustering-based Sparse Transformer for Long-Range Dependency Encoding

Luowei Zhou
Luowei Zhou
Yen-Chun Chen
Yen-Chun Chen
Cited by: 0|Views54

Abstract:

Transformer has become ubiquitous in the deep learning field. One of the key ingredients that destined its success is the self-attention mechanism, which allows fully-connected contextual encoding over input tokens. However, despite its effectiveness in modeling short sequences, self-attention suffers when handling inputs with extreme l...More

Code:

Data:

Full Text
Bibtex
Your rating :
0

 

Tags
Comments