Few-Shot Recalibration of Language Models
CoRR(2024)
Abstract
Recent work has uncovered promising ways to extract well-calibrated
confidence estimates from language models (LMs), where the model's confidence
score reflects how likely it is to be correct. However, while LMs may appear
well-calibrated over broad distributions, this often hides significant
miscalibration within narrower slices (e.g., systemic over-confidence in math
can balance out systemic under-confidence in history, yielding perfect
calibration in aggregate). To attain well-calibrated confidence estimates for
any slice of a distribution, we propose a new framework for few-shot
slice-specific recalibration. Specifically, we train a recalibration model that
takes in a few unlabeled examples from any given slice and predicts a curve
that remaps confidence scores to be more accurate for that slice. Our trained
model can recalibrate for arbitrary new slices, without using any labeled data
from that slice. This enables us to identify domain-specific confidence
thresholds above which the LM's predictions can be trusted, and below which it
should abstain. Experiments show that our few-shot recalibrator consistently
outperforms existing calibration methods, for instance improving calibration
error for PaLM2-Large on MMLU by 16
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined