On The Relation Between Assessor'S Agreement And Accuracy In Gamified Relevance Assessment

SIGIR '15: The 38th International ACM SIGIR conference on research and development in Information Retrieval Santiago Chile August, 2015(2015)

引用 11|浏览9
暂无评分
摘要
Expert judgments (labels) are widely used in Information Retrieval for the purposes of search quality evaluation and machine learning. Setting up the process of collecting such judgments is a challenge of its own, and the maintenance of judgments quality is an extremely important part of the process. One of the possible ways of controlling the quality is monitoring inter-assessor agreement level. But does the agreement level really reflect the quality of assessor's judgments? Indeed, if a group of assessors comes to a consensus, to what extent should we trust their collective opinion? In this paper, we investigate, whether the agreement level can be used as a metric for estimating the quality of assessor's judgments, and provide recommendations for the design of judgments collection workflow. Namely, we estimate the correlation between assessors' accuracy and agreement in the scope of several workflow designs and investigate which specific workflow features influence the accuracy of judgments the most.
更多
查看译文
关键词
Relevance labels,agreement vs. accuracy,judgments collection workflow
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要