OpenToM: A Comprehensive Benchmark for Evaluating Theory-of-Mind Reasoning Capabilities of Large Language Models
CoRR(2024)
摘要
Neural Theory-of-Mind (N-ToM), machine's ability to understand and keep track
of the mental states of others, is pivotal in developing socially intelligent
agents. However, prevalent N-ToM benchmarks have several shortcomings,
including the presence of ambiguous and artificial narratives, absence of
personality traits and preferences, a lack of questions addressing characters'
psychological mental states, and limited diversity in the questions posed. In
response to these issues, we construct OpenToM, a new benchmark for assessing
N-ToM with (1) longer and clearer narrative stories, (2) characters with
explicit personality traits, (3) actions that are triggered by character
intentions, and (4) questions designed to challenge LLMs' capabilities of
modeling characters' mental states of both the physical and psychological
world. Using OpenToM, we reveal that state-of-the-art LLMs thrive at modeling
certain aspects of mental states in the physical world but fall short when
tracking characters' mental states in the psychological world.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要