Sparsifying Transformer Models with Trainable Representation Pooling.

Michal Pietruszka,Lukasz Borchmann,Lukasz Garncarek

Annual Meeting of the Association for Computational Linguistics（2022）

引用 0|浏览1

暂无评分

摘要

We propose a novel method to sparsify attention in the Transformer model by learning to select the most-informative token representations during the training process, thus focusing on the task-specific parts of an input. A reduction of quadratic time and memory complexity to sublinear was achieved due to a robust trainable top-

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要