Generating with Confidence: Uncertainty Quantification for Black-box Large Language Models
arxiv(2023)
摘要
Large language models (LLMs) specializing in natural language generation
(NLG) have recently started exhibiting promising capabilities across a variety
of domains. However, gauging the trustworthiness of responses generated by LLMs
remains an open challenge, with limited research on uncertainty quantification
(UQ) for NLG. Furthermore, existing literature typically assumes white-box
access to language models, which is becoming unrealistic either due to the
closed-source nature of the latest LLMs or computational constraints. In this
work, we investigate UQ in NLG for *black-box* LLMs. We first differentiate
*uncertainty* vs *confidence*: the former refers to the “dispersion” of the
potential predictions for a fixed input, and the latter refers to the
confidence on a particular prediction/generation. We then propose and compare
several confidence/uncertainty measures, applying them to *selective NLG* where
unreliable results could either be ignored or yielded for further assessment.
Experiments were carried out with several popular LLMs on question-answering
datasets (for evaluation purposes). Results reveal that a simple measure for
the semantic dispersion can be a reliable predictor of the quality of LLM
responses, providing valuable insights for practitioners on uncertainty
management when adopting LLMs. The code to replicate our experiments is
available at https://github.com/zlin7/UQ-NLG.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要