Different Tastes of Entities: Investigating Human Label Variation in Named Entity Annotations
CoRR(2024)
摘要
Named Entity Recognition (NER) is a key information extraction task with a
long-standing tradition. While recent studies address and aim to correct
annotation errors via re-labeling efforts, little is known about the sources of
human label variation, such as text ambiguity, annotation error, or guideline
divergence. This is especially the case for high-quality datasets and beyond
English CoNLL03. This paper studies disagreements in expert-annotated named
entity datasets for three languages: English, Danish, and Bavarian. We show
that text ambiguity and artificial guideline changes are dominant factors for
diverse annotations among high-quality revisions. We survey student annotations
on a subset of difficult entities and substantiate the feasibility and
necessity of manifold annotations for understanding named entity ambiguities
from a distributional perspective.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要