Inference to the Best Explanation in Large Language Models
CoRR(2024)
摘要
While Large Language Models (LLMs) have found success in real-world
applications, their underlying explanatory process is still poorly understood.
This paper proposes IBE-Eval, a framework inspired by philosophical accounts on
Inference to the Best Explanation (IBE) to advance the interpretation and
evaluation of LLMs' explanations. IBE-Eval estimates the plausibility of
natural language explanations through a combination of explicit logical and
linguistic features including: consistency, parsimony, coherence, and
uncertainty. Extensive experiments are conducted on Causal Question Answering
(CQA), where IBE-Eval is tasked to select the most plausible causal
explanation amongst competing ones generated by LLMs (i.e., GPT 3.5 and Llama
2). The experiments reveal that IBE-Eval can successfully identify the best
explanation with up to 77% accuracy (≈ 27% above random), improving
upon a GPT 3.5-as-a-Judge baseline (≈+17%) while being intrinsically
more efficient and interpretable. Additional analyses suggest that, despite
model-specific variances, LLM-generated explanations tend to conform to IBE
criteria and that IBE-Eval is significantly correlated with human judgment,
opening up opportunities for future development of automated explanation
verification tools.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要