Corpus Bootstrapping for Assessment of the Properties of Effectiveness Measures

CIKM '20: The 29th ACM International Conference on Information and Knowledge Management Virtual Event Ireland October, 2020(2020)

引用 9|浏览8
暂无评分
摘要
Bootstrapping is an established tool for examining the behaviour of offline information retrieval (IR) experiments, where it has primarily been used to assess statistical significance and the robustness of significance tests. In this work we consider how bootstrapping can be used to assess the reliability of effectiveness measures for experimental IR. We use bootstrapping of the corpus of documents rather than, as in most prior work, the set of queries. We demonstrate that bootstrapping can provide new insights into the behaviour of effectiveness measures: the precision of the measurement of a system for a query can be quantified; some measures are more consistent than others; rankings of systems on a test corpus likewise have a precision (or uncertainty) that can be quantified; and, in experiments with limited volumes of relevance judgements, measures can be wildly different in terms of reliability and precision. Our results show that the uncertainty in measurement and ranking of system performance can be substantial and thus our approach to corpus bootstrapping provides a key tool for helping experimenters to choose measures and understand reported outcomes.
更多
查看译文
关键词
measurement, bootstrap, corpus properties, experimental design
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要