Anchored Answers: Unravelling Positional Bias in GPT-2's Multiple-Choice Questions
arxiv(2024)
摘要
Large Language Models (LLMs), such as the GPT-4 and LLaMA families, have
demonstrated considerable success across diverse tasks, including
multiple-choice questions (MCQs). However, these models exhibit a positional
bias, particularly an even worse anchored bias in the GPT-2 family, where they
consistently favour the first choice 'A' in MCQs during inference. This
anchored bias challenges the integrity of GPT-2's decision-making process, as
it skews performance based on the position rather than the content of the
choices in MCQs. In this study, we utilise the mechanistic interpretability
approach to identify the internal modules within GPT-2 models responsible for
this bias. We focus on the Multi-Layer Perceptron (MLP) layers and attention
heads, using the "logit lens" method to trace and modify the specific value
vectors that contribute to the bias. By updating these vectors within MLP and
recalibrating attention patterns to neutralise the preference for the first
choice 'A', we effectively mitigate the anchored bias. Our interventions not
only correct the bias but also improve the overall MCQ prediction accuracy for
the GPT-2 family across various datasets. This work represents the first
comprehensive mechanistic analysis of anchored bias in MCQs within the GPT-2
models, introducing targeted, minimal-intervention strategies that
significantly enhance GPT2 model robustness and accuracy in MCQs. Our code is
available at https://github.com/ruizheliUOA/Anchored_Bias_GPT2.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要