Random Feature Attention
ICLR 2021, 2021.
We propose a random-feature-based attention that scales linearly in sequence length, and performs on par with strong transformer baselines on language modeling and machine translation.
Transformers are state-of-the-art models for a variety of sequence modeling tasks. At their core is an attention function which models pairwise interactions between the inputs at every timestep. While attention is powerful, it does not scale efficiently to long sequences due to its quadratic time and space complexity in the sequence lengt...More
PPT (Upload PPT)