Evaluating Question Answering Evaluation
MRQA@EMNLP, pp. 119-124, 2019.
As the complexity of question answering (QA) datasets evolve, moving away from restricted formats like span extraction and multiple-choice (MC) to free-form answer generation, it is imperative to understand how well current metrics perform in evaluating QA. This is especially important as existing metrics (BLEU, ROUGE, METEOR, and F1) are...More
Full Text (Upload PDF)
PPT (Upload PPT)