FREB-TQA: A Fine-Grained Robustness Evaluation Benchmark for Table Question Answering

Wei Zhou,Mohsen Mesgar,Heike Adel,Annemarie Friedrich

NAACL-HLT（2024）

引用 0|浏览14

暂无评分

摘要

Table Question Answering (TQA) aims at composing an answer to a questionbased on tabular data. While prior research has shown that TQA models lackrobustness, understanding the underlying cause and nature of this issue remainspredominantly unclear, posing a significant obstacle to the development ofrobust TQA systems. In this paper, we formalize three major desiderata for afine-grained evaluation of robustness of TQA systems. They should (i) answerquestions regardless of alterations in table structure, (ii) base theirresponses on the content of relevant cells rather than on biases, and (iii)demonstrate robust numerical reasoning capabilities. To investigate theseaspects, we create and publish a novel TQA evaluation benchmark in English. Ourextensive experimental analysis reveals that none of the examinedstate-of-the-art TQA systems consistently excels in these three aspects. Ourbenchmark is a crucial instrument for monitoring the behavior of TQA systemsand paves the way for the development of robust TQA systems. We release ourbenchmark publicly.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要