LUQ: Long-text Uncertainty Quantification for LLMs
CoRR(2024)
摘要
Large Language Models (LLMs) have demonstrated remarkable capability in a
variety of NLP tasks. Despite their effectiveness, these models are prone to
generate nonfactual content. Uncertainty Quantification (UQ) is pivotal in
enhancing our understanding of a model's confidence in its generated content,
thereby aiding in the mitigation of nonfactual outputs. Existing research on UQ
predominantly targets short text generation, typically yielding brief,
word-limited responses. However, real-world applications frequently necessitate
much longer responses. Our study first highlights the limitations of current UQ
methods in handling long text generation. We then introduce Luq, a
novel sampling-based UQ approach specifically designed for long text. Our
findings reveal that Luq outperforms existing baseline methods in
correlating with the model's factuality scores (negative coefficient of -0.85
observed for Gemini Pro). With Luq as the tool for UQ, we investigate
behavior patterns of several popular LLMs' response confidence spectrum and how
that interplays with the response' factuality. We identify that LLMs lack
confidence in generating long text for rare facts and a factually strong model
(i.e. GPT-4) tends to reject questions it is not sure about. To further improve
the factual accuracy of LLM responses, we propose a method called
Luq-Ensemble that ensembles responses from multiple models and selects
the response with the least uncertainty. The ensembling method greatly improves
the response factuality upon the best standalone LLM.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要