Measuring Gender and Racial Biases in Large Language Models
arxiv(2024)
摘要
In traditional decision making processes, social biases of human decision
makers can lead to unequal economic outcomes for underrepresented social
groups, such as women, racial or ethnic minorities. Recently, the increasing
popularity of Large language model based artificial intelligence suggests a
potential transition from human to AI based decision making. How would this
impact the distributional outcomes across social groups? Here we investigate
the gender and racial biases of OpenAIs GPT, a widely used LLM, in a high
stakes decision making setting, specifically assessing entry level job
candidates from diverse social groups. Instructing GPT to score approximately
361000 resumes with randomized social identities, we find that the LLM awards
higher assessment scores for female candidates with similar work experience,
education, and skills, while lower scores for black male candidates with
comparable qualifications. These biases may result in a 1 or 2 percentage point
difference in hiring probabilities for otherwise similar candidates at a
certain threshold and are consistent across various job positions and
subsamples. Meanwhile, we also find stronger pro female and weaker anti black
male patterns in democratic states. Our results demonstrate that this LLM based
AI system has the potential to mitigate the gender bias, but it may not
necessarily cure the racial bias. Further research is needed to comprehend the
root causes of these outcomes and develop strategies to minimize the remaining
biases in AI systems. As AI based decision making tools are increasingly
employed across diverse domains, our findings underscore the necessity of
understanding and addressing the potential unequal outcomes to ensure equitable
outcomes across social groups.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要