ChatQA: Surpassing GPT-4 on Conversational QA and RAG
arxiv(2024)
摘要
In this work, we introduce ChatQA, a suite of models that outperform GPT-4 on
retrieval-augmented generation (RAG) and conversational question answering
(QA). To enhance generation, we propose a two-stage instruction tuning method
that significantly boosts the performance of RAG. For effective retrieval, we
introduce a dense retriever optimized for conversational QA, which yields
results comparable to the alternative state-of-the-art query rewriting models,
while substantially reducing deployment costs. We also present the ChatRAG
Bench, which encompasses ten datasets covering comprehensive evaluations on
RAG, table-related QA, arithmetic calculations, and scenarios involving
unanswerable questions. Our ChatQA-1.0-70B (score: 54.14), built on Llama2, a
weaker foundation model than GPT-4, can slightly outperform GPT-4-0613 (score:
53.90) and GPT-4-Turbo-2024-04-09 (score: 54.03) on the ChatRAG Bench, without
relying on any synthetic data from OpenAI GPT models. Notably,
Llama3-ChatQA-1.5-70B model surpasses the accuracy of GPT-4-Turbo-2024-04-09 by
a margin. To advance research in this field, we open-sourced the model weights,
instruction tuning data, ChatRAG Bench, and retriever for the community:
https://chatqa-project.github.io/.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要