Exploring the Residual Stream of Transformers
CoRR(2023)
摘要
Transformer-based models have achieved great breakthroughs in recent years.
However, there are many significant questions that have not been answered in
the field of explaining the reason why the models have powerful outputs. We do
not know how to locate the models' important parameters storing the knowledge
for predicting the next word, and whether these parameters are stored on the
same layer/module or different ones. Moreover, we do not understand the
mechanism to merge the knowledge into the final embedding for next word
prediction. In this paper, we explore the residual stream of transformers to
increase the interpretability. We find the mechanism behind residual connection
is a direct addition function on before-softmax values, so the probabilities of
tokens with larger before-softmax values will increase. Moreover, we prove that
using log probability increase as contribution scores is reasonable, and based
on this we can locate important parameters. Besides, we propose a method to
analyze how previous layers affect upper layers by comparing the inner
products. The experimental results and case study show that our research can
increase the interpretability of transformer-based models. We will release our
code on https://github.com/zepingyu0512/residualstream.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要