Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation
arxiv(2024)
摘要
Despite the successes of large language models (LLMs), they exhibit
significant drawbacks, particularly when processing long contexts. Their
inference cost scales quadratically with respect to sequence length, making it
expensive for deployment in some real-world text processing applications, such
as retrieval-augmented generation (RAG). Additionally, LLMs also exhibit the
"distraction phenomenon," where irrelevant context in the prompt degrades
output quality. To address these drawbacks, we propose a novel RAG prompting
methodology, superposition prompting, which can be directly applied to
pre-trained transformer-based LLMs without the need for fine-tuning. At a high
level, superposition prompting allows the LLM to process input documents in
parallel prompt paths, discarding paths once they are deemed irrelevant. We
demonstrate the capability of our method to simultaneously enhance time
efficiency across a variety of question-answering benchmarks using multiple
pre-trained LLMs. Furthermore, our technique significantly improves accuracy
when the retrieved context is large relative the context the model was trained
on. For example, our approach facilitates an 93x reduction in compute time
while improving accuracy by 43% on the NaturalQuestions-Open dataset with the
MPT-7B instruction-tuned model over naive RAG.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要