Let's agree to disagree: on the evaluation of vocabulary alignment.

K-CAP '2011: Knowledge Capture Conference Banff Alberta Canada June, 2011(2011)

引用 13|浏览32
暂无评分
摘要
Gold standard mappings created by experts are at the core of alignment evaluation. At the same time, the process of manual evaluation is rarely discussed. While the practice of having multiple raters evaluate results is accepted, their level of agreement is often not measured. In this paper we describe three experiments in manual evaluation and study the way different raters evaluate mappings. We used alignments generated using different techniques and between vocabularies of different type. In each experiment, five raters evaluated alignments and talked through their decisions using the think aloud method. In all three experiments we found that inter-rater agreement was low and analyzed our data to find the reasons for it. Our analysis shows which variables can be controlled to affect the level of agreement including the mapping relations, the evaluation guidelines and the background of the raters. On the other hand, differences in the perception of raters, and the complexity of the relations between often ill-defined natural language concepts remain inherent sources of disagreement. Our results indicate that the manual evaluation of ontology alignments is by no means an easy task and that the ontology alignment community should be careful in the construction and use of reference alignments.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要