Bringing the People Back In: Contesting Benchmark Machine Learning Datasets
arxiv(2020)
摘要
In response to algorithmic unfairness embedded in sociotechnical systems,
significant attention has been focused on the contents of machine learning
datasets which have revealed biases towards white, cisgender, male, and Western
data subjects. In contrast, comparatively less attention has been paid to the
histories, values, and norms embedded in such datasets. In this work, we
outline a research program - a genealogy of machine learning data - for
investigating how and why these datasets have been created, what and whose
values influence the choices of data to collect, the contextual and contingent
conditions of their creation. We describe the ways in which benchmark datasets
in machine learning operate as infrastructure and pose four research questions
for these datasets. This interrogation forces us to "bring the people back in"
by aiding us in understanding the labor embedded in dataset construction, and
thereby presenting new avenues of contestation for other researchers
encountering the data.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络