TANQ: An open domain dataset of table answered questions
arxiv(2024)
摘要
Language models, potentially augmented with tool usage such as retrieval are
becoming the go-to means of answering questions. Understanding and answering
questions in real-world settings often requires retrieving information from
different sources, processing and aggregating data to extract insights, and
presenting complex findings in form of structured artifacts such as novel
tables, charts, or infographics. In this paper, we introduce TANQ, the first
open domain question answering dataset where the answers require building
tables from information across multiple sources. We release the full source
attribution for every cell in the resulting table and benchmark
state-of-the-art language models in open, oracle, and closed book setups. Our
best-performing baseline, GPT4 reaches an overall F1 score of 29.1, lagging
behind human performance by 19.7 points. We analyse baselines' performance
across different dataset attributes such as different skills required for this
task, including multi-hop reasoning, math operations, and unit conversions. We
further discuss common failures in model-generated answers, suggesting that
TANQ is a complex task with many challenges ahead.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要