Understanding the Expressive Power and Mechanisms of Transformer for Sequence Modeling
CoRR(2024)
摘要
We conduct a systematic study of the approximation properties of Transformer
for sequence modeling with long, sparse and complicated memory. We investigate
the mechanisms through which different components of Transformer, such as the
dot-product self-attention, positional encoding and feed-forward layer, affect
its expressive power, and we study their combined effects through establishing
explicit approximation rates. Our study reveals the roles of critical
parameters in the Transformer, such as the number of layers and the number of
attention heads, and these insights also provide natural suggestions for
alternative architectures.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要