ContraDoc: Understanding Self-Contradictions in Documents with Large Language Models
arxiv(2023)
摘要
In recent times, large language models (LLMs) have shown impressive
performance on various document-level tasks such as document classification,
summarization, and question-answering. However, research on understanding their
capabilities on the task of self-contradictions in long documents has been very
limited. In this work, we introduce ContraDoc, the first human-annotated
dataset to study self-contradictions in long documents across multiple domains,
varying document lengths, self-contradictions types, and scope. We then analyze
the current capabilities of four state-of-the-art open-source and commercially
available LLMs: GPT3.5, GPT4, PaLM2, and LLaMAv2 on this dataset. While GPT4
performs the best and can outperform humans on this task, we find that it is
still unreliable and struggles with self-contradictions that require more
nuance and context. We release the dataset and all the code associated with the
experiments (https://github.com/ddhruvkr/CONTRADOC).
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要