Random Feature Attention

ICLR 2021, 2021.

Cited by: 4|Views34
Weibo:
We propose a random-feature-based attention that scales linearly in sequence length, and performs on par with strong transformer baselines on language modeling and machine translation.

Abstract:

Transformers are state-of-the-art models for a variety of sequence modeling tasks. At their core is an attention function which models pairwise interactions between the inputs at every timestep. While attention is powerful, it does not scale efficiently to long sequences due to its quadratic time and space complexity in the sequence lengt...More

Code:

Data:

0
Full Text
Bibtex
Weibo
Your rating :
0

 

Tags
Comments