A Canary in the AI Coal Mine: American Jews May Be Disproportionately Harmed by Intellectual Property Dispossession in Large Language Model Training.
CHI '24 Proceedings of the CHI Conference on Human Factors in Computing Systems(2024)
Abstract
Systemic property dispossession from minority groups has often been carriedout in the name of technological progress. In this paper, we identify evidencethat the current paradigm of large language models (LLMs) likely continues thislong history. Examining common LLM training datasets, we find that adisproportionate amount of content authored by Jewish Americans is used fortraining without their consent. The degree of over-representation ranges fromaround 2x to around 6.5x. Given that LLMs may substitute for the paid labor ofthose who produced their training data, they have the potential to cause evenmore substantial and disproportionate economic harm to Jewish Americans in thecoming years. This paper focuses on Jewish Americans as a case study, but it isprobable that other minority communities (e.g., Asian Americans, HinduAmericans) may be similarly affected and, most importantly, the results shouldlikely be interpreted as a "canary in the coal mine" that highlights deepstructural concerns about the current LLM paradigm whose harms could soonaffect nearly everyone. We discuss the implications of these results for thepolicymakers thinking about how to regulate LLMs as well as for those in the AIfield who are working to advance LLMs. Our findings stress the importance ofworking together towards alternative LLM paradigms that avoid both disparateimpacts and widespread societal harms.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined