Challenging the Validity of Personality Tests for Large Language Models
arXiv (Cornell University)(2023)
摘要
With large language models (LLMs) like GPT-4 appearing to behave increasingly
human-like in text-based interactions, it has become popular to attempt to
evaluate personality traits of LLMs using questionnaires originally developed
for humans. While reusing measures is a resource-efficient way to evaluate
LLMs, careful adaptations are usually required to ensure that assessment
results are valid even across human subpopulations. In this work, we provide
evidence that LLMs' responses to personality tests systematically deviate from
human responses, implying that the results of these tests cannot be interpreted
in the same way. Concretely, reverse-coded items ("I am introverted" vs. "I am
extraverted") are often both answered affirmatively. Furthermore, variation
across prompts designed to "steer" LLMs to simulate particular personality
types does not follow the clear separation into five independent personality
factors from human samples. In light of these results, we believe that it is
important to investigate tests' validity for LLMs before drawing strong
conclusions about potentially ill-defined concepts like LLMs' "personality".
更多查看译文
关键词
personality tests,large language models,language models
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要