FLuRKA: Fast and accurate unified Low-Rank Kernel Attention
CoRR(2023)
摘要
Many efficient approximate self-attention techniques have become
prevalent since the inception of the transformer architecture. Two popular
classes of these techniques are low-rank and kernel methods. Each of these
methods has its strengths. We observe these strengths synergistically
complement each other and exploit them to fuse low-rank and kernel methods,
producing a new class of transformers: FLuRKA (Fast
Low-Rank KernelAttention).
FLuRKA are highly training-efficient with faster model speeds
and similar model qualities compared to constituent low-rank and
kernel methods. We theoretically and empirically evaluate the speed and quality
of FLuRKA. Our model speed analysis posits a variety of parameter
configurations where FLuRKA exhibit speedups over low-rank and kernel
approximations and our model quality analysis bounds the error of FLuRKA with
respect to full-attention. Empirically, we instantiate three FLuRKA variants
which experience speedups of up to 3.3x and 1.7x over low-rank and kernel
methods respectively. This translates to speedups of up to 20x over models with
flash-attention. Across a diverse set of tasks spanning language modeling,
language understanding, long sequence modeling, machine translation, and image
classification, FLuRKA achieve comparable accuracy with underlying low-rank and
kernel approximations, occasionally surpassing both.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要