An Entailment-Based Approach to the QA4MRE Challenge.

CLEF (Online Working Notes/Labs/Workshop)(2013)

引用 24|浏览25
暂无评分
摘要
This paper describes our entry to the 2012 QA4MRE Main Task (English dataset). The QA4MRE task poses a significant challenge as the expression of knowledge in the question and answer (in the document) typically substantially differs. Ultimately, one would need a system that can perform full machine reading – creating an internal model of the document’s meaning – to achieve high performance. Our approach is a preliminary step toward this, based on estimating the likelihood of textual entailment between sentences in the text, and the question Q and each candidate answer Ai. We first treat the question Q and each answer Ai independently, and find sets of sentences SQ, SA that each plausibly entail (the target of) Q or one of the Ai respectively. We then search for the closest (in the document) pair of sentences in these sets, and conclude that the answer Ai entailed by SAi in the closest pair is the answer. This approach assumes coherent discourse, i.e., that sentences close together are usually “talking about the same thing”, and thus conveying a single idea (namely an expression of the Q+Ai pair). In QA4MRE it is hard to "prove" entailment, as a candidate answer A may be expressed using a substantially different wording in the document, over multiple sentences, and only partially (as some aspects of A may be left implicit in the document, to be filled in by the reader). As a result, we instead estimate the likelihood of entailment (that a sentence S entails A) by look for evidence, namely entailment relationships between components of S and A such as words, bigrams, trigrams, and parse fragments. To identify these possible entailment relationships we use three knowledge resources, namely WordNet, ParaPara (a large paraphrase database from Johns Hopkins University), and the DIRT paraphrase database. Our best run scored 40% in the evaluation, and around 42% in additional (unsubmitted) runs afterwards. In ablation studies, we found that the majority of our score (approximately 38%) could be attributed to the basic algorithm, with the knowledge resources adding approximately 4% to this baseline score. Finally we critique our approach with respect to the broader goal of machine reading, and discuss what is needed to move closer to that goal.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要