Are Ir Evaluation Measures On An Interval Scale?

ICTIR'17: PROCEEDINGS OF THE 2017 ACM SIGIR INTERNATIONAL CONFERENCE THEORY OF INFORMATION RETRIEVAL(2017)

引用 13|浏览93
暂无评分
摘要
In this paper, we formally investigate whether, or not, IR evaluation measures are on an interval scale, which is needed to safely compute the basic statistics, such as mean and variance, we daily use to compare IR systems. We face this issue in the framework of the representational theory of measurement and we rely on the notion of difference structure, i.e. a total equi-spaced ordering on the system runs.We found that the most popular set-based measures, i.e. precision, recall, and F-measure are interval-based. In the case of rank-based measures, using a strongly top-heavy ordering, we found that only RBP with p = 1/2 is on an interval scale while RBP for other p values, AP, DCG, and ERR are not. Moreover, using a weakly top-heavy ordering, we found that none of RBP, AP, DCG, and ERR is on an interval scale.
更多
查看译文
关键词
evaluation measures, representational theory of measurement, interval scale
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要