Retrieval-based Full-length Wikipedia Generation for Emergent Events
CoRR(2024)
摘要
In today's fast-paced world, the growing demand to quickly generate
comprehensive and accurate Wikipedia documents for emerging events is both
crucial and challenging. However, previous efforts in Wikipedia generation have
often fallen short of meeting real-world requirements. Some approaches focus
solely on generating segments of a complete Wikipedia document, while others
overlook the importance of faithfulness in generation or fail to consider the
influence of the pre-training corpus. In this paper, we simulate a real-world
scenario where structured full-length Wikipedia documents are generated for
emergent events using input retrieved from web sources. To ensure that Large
Language Models (LLMs) are not trained on corpora related to recently occurred
events, we select events that have taken place recently and introduce a new
benchmark Wiki-GenBen, which consists of 309 events paired with their
corresponding retrieved web pages for generating evidence. Additionally, we
design a comprehensive set of systematic evaluation metrics and baseline
methods, to evaluate the capability of LLMs in generating factual full-length
Wikipedia documents. The data and code are open-sourced at WikiGenBench.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要