Do Language Models Enjoy Their Own Stories? Prompting Large Language Models for Automatic Story Evaluation
CoRR(2024)
Abstract
Storytelling is an integral part of human experience and plays a crucial role
in social interactions. Thus, Automatic Story Evaluation (ASE) and Generation
(ASG) could benefit society in multiple ways, but they are challenging tasks
which require high-level human abilities such as creativity, reasoning and deep
understanding. Meanwhile, Large Language Models (LLM) now achieve
state-of-the-art performance on many NLP tasks. In this paper, we study whether
LLMs can be used as substitutes for human annotators for ASE. We perform an
extensive analysis of the correlations between LLM ratings, other automatic
measures, and human annotations, and we explore the influence of prompting on
the results and the explainability of LLM behaviour. Most notably, we find that
LLMs outperform current automatic measures for system-level evaluation but
still struggle at providing satisfactory explanations for their answers.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined