A Question on the Explainability of Large Language Models and the Word-Level Univariate First-Order Plausibility Assumption
CoRR(2024)
摘要
The explanations of large language models have recently been shown to be
sensitive to the randomness used for their training, creating a need to
characterize this sensitivity. In this paper, we propose a characterization
that questions the possibility to provide simple and informative explanations
for such models. To this end, we give statistical definitions for the
explanations' signal, noise and signal-to-noise ratio. We highlight that, in a
typical case study where word-level univariate explanations are analyzed with
first-order statistical tools, the explanations of simple feature-based models
carry more signal and less noise than those of transformer ones. We then
discuss the possibility to improve these results with alternative definitions
of signal and noise that would capture more complex explanations and analysis
methods, while also questioning the tradeoff with their plausibility for
readers.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要