KPQA: A Metric for Generative Question Answering Using Word Weights
Abstract:
For the automatic evaluation of Generative Question Answering (genQA) systems, it is essential to assess the correctness of the generated answers. However, n-gram similarity metrics, which are widely used to compare generated texts and references, are prone to misjudge fact-based assessments. Moreover, there is a lack of benchmark datas...More
Code:
Data:
Full Text
Tags
Comments