Layer-wise Pruning of Transformer Attention Heads for Efficient Language Modeling

Kyuhong Shim,Iksoo Choi,Wonyong Sung,Jungwook Choi

2021 18th International SoC Design Conference (ISOCC)（2021）

引用 1|浏览14

暂无评分

摘要

Recently, the necessity of multiple attention heads in transformer architecture has been questioned [1]. Removing less important heads from a large network is a promising strategy to reduce computation cost and parameters. However, pruning out attention heads in multihead attention does not evenly reduce the overall load, because feedforward modules are not affected. In this study, we apply attent...

查看译文

关键词

Training,Costs,Sensitivity,Computational modeling,Computer architecture,Transformers,Computational efficiency

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要