Towards Unbiased Evaluation of Detecting Unanswerable Questions in EHRSQL
arxiv(2024)
摘要
Incorporating unanswerable questions into EHR QA systems is crucial for
testing the trustworthiness of a system, as providing non-existent responses
can mislead doctors in their diagnoses. The EHRSQL dataset stands out as a
promising benchmark because it is the only dataset that incorporates
unanswerable questions in the EHR QA system alongside practical questions.
However, in this work, we identify a data bias in these unanswerable questions;
they can often be discerned simply by filtering with specific N-gram patterns.
Such biases jeopardize the authenticity and reliability of QA system
evaluations. To tackle this problem, we propose a simple debiasing method of
adjusting the split between the validation and test sets to neutralize the
undue influence of N-gram filtering. By experimenting on the MIMIC-III dataset,
we demonstrate both the existing data bias in EHRSQL and the effectiveness of
our data split strategy in mitigating this bias.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要