NegotiationToM: A Benchmark for Stress-testing Machine Theory of Mind on Negotiation Surrounding
arxiv(2024)
摘要
Large Language Models (LLMs) have sparked substantial interest and debate
concerning their potential emergence of Theory of Mind (ToM) ability. Theory of
mind evaluations currently focuses on testing models using machine-generated
data or game settings prone to shortcuts and spurious correlations, which lacks
evaluation of machine ToM ability in real-world human interaction scenarios.
This poses a pressing demand to develop new real-world scenario benchmarks. We
introduce NegotiationToM, a new benchmark designed to stress-test machine ToM
in real-world negotiation surrounding covered multi-dimensional mental states
(i.e., desires, beliefs, and intentions). Our benchmark builds upon the
Belief-Desire-Intention (BDI) agent modeling theory and conducts the necessary
empirical experiments to evaluate large language models. Our findings
demonstrate that NegotiationToM is challenging for state-of-the-art LLMs, as
they consistently perform significantly worse than humans, even when employing
the chain-of-thought (CoT) method.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要