Fine-tuning Language Models for Factuality
ICLR 2024(2023)
摘要
The fluency and creativity of large pre-trained language models (LLMs) have
led to their widespread use, sometimes even as a replacement for traditional
search engines. Yet language models are prone to making convincing but
factually inaccurate claims, often referred to as 'hallucinations.' These
errors can inadvertently spread misinformation or harmfully perpetuate
misconceptions. Further, manual fact-checking of model responses is a
time-consuming process, making human factuality labels expensive to acquire. In
this work, we fine-tune language models to be more factual, without human
labeling and targeting more open-ended generation settings than past work. We
leverage two key recent innovations in NLP to do so. First, several recent
works have proposed methods for judging the factuality of open-ended text by
measuring consistency with an external knowledge base or simply a large model's
confidence scores. Second, the direct preference optimization algorithm enables
straightforward fine-tuning of language models on objectives other than
supervised imitation, using a preference ranking over possible model responses.
We show that learning from automatically generated factuality preference
rankings, generated either through existing retrieval systems or our novel
retrieval-free approach, significantly improves the factuality (percent of
generated claims that are correct) of Llama-2 on held-out topics compared with
RLHF or decoding strategies targeted at factuality. At 7B scale, compared to
Llama-2-chat, we observe 58% and 40% reduction in factual error rate when
generating biographies and answering medical questions, respectively.
更多查看译文
关键词
factuality,hallucination,language model,dpo
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要