From Labels to Decisions: A Mapping-Aware Annotator Model.

Evan Yao,Jagdish Ramakrishnan,Xu Chen,Viet-An Nguyen,Udi Weinsberg

KDD（2023）

引用 0|浏览8

暂无评分

摘要

Online platforms regularly rely on human annotators to make real-time operational decisions for tasks such as content moderation. While crowdsourcing models have been proposed for aggregating noisy labels, they do not generalize well when annotators produce a labels in a large space, e.g., generated from complex review trees. We study a novel crowdsourcing setting with D possible operational decisions or outcomes, but annotators produce labels in a larger space of size L > D which are mapped to decisions through a known mapping function. For content moderation, such labels can correspond to violation reasons (e.g. nudity, violence), while the space of decisions is binary: remove the content or keep it up. In this setting, it is more important to make the right decision rather than estimating the correct underlying label. Existing methods typically separate out the labels to decisions mapping from the modeling of annotators, leading to sub-optimal statistical inference efficiency and excessive computation complexity. We propose a novel confusion matrix model for each annotator that leverages this mapping. Our model is parameterized in a hierarchical manner with both population parameters shared across annotators to model shared confusions and individual parameters to admit heterogeneity among annotators. With extensive numerical experiments, we demonstrate that the proposed model substantially improves accuracy over existing methods and scales well for moderate and large L. In a real-world application on content moderation at Meta, the proposed method offers a 13% improvement in AUC over prior methods, including Meta's existing model in production.

查看译文

关键词

Crowdsourcing,confusion matrix,content moderation

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要