StoryER: Automatic Story Evaluation via Ranking, Rating and Reasoning
Hong Chen, Duc Vo, Hiroya Takamura, Yusuke Miyao, Hideki Nakayama
Abstract
Existing automatic story evaluation methods place a premium on story lexical level coherence, deviating from human preference. We go beyond this limitation by considering a novel Story Evaluation method that mimics human preference when judging a story, namely StoryER, which consists of three sub-tasks: Ranking, Rating and Reasoning.Given either a machine-generated or a human-written story, StoryER requires the machine to output 1) a preference score that corresponds to human preference, 2) specific ratings and their corresponding confidences and 3) comments for various aspects (e.g., opening, character-shaping).To support these tasks, we introduce a well-annotated dataset comprising (i) 100k ranked story pairs; and (ii) a set of 46k ratings and comments on various aspects of the story. We finetune Longformer-Encoder-Decoder (LED) on the collected dataset, with the encoder responsible for preference score and aspect prediction and the decoder for comment generation. Our comprehensive experiments result a competitive benchmark for each task, showing the high correlation to human preference. In addition, we have witnessed the joint learning of the preference scores, the aspect ratings, and the comments brings gain each single task. Our dataset and benchmarks are publicly available to advance the research of story evaluation tasks.- Anthology ID:
- 2022.emnlp-main.114
- Volume:
- Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
- Month:
- December
- Year:
- 2022
- Address:
- Abu Dhabi, United Arab Emirates
- Editors:
- Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1739–1753
- Language:
- URL:
- https://aclanthology.org/2022.emnlp-main.114
- DOI:
- 10.18653/v1/2022.emnlp-main.114
- Cite (ACL):
- Hong Chen, Duc Vo, Hiroya Takamura, Yusuke Miyao, and Hideki Nakayama. 2022. StoryER: Automatic Story Evaluation via Ranking, Rating and Reasoning. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 1739–1753, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Cite (Informal):
- StoryER: Automatic Story Evaluation via Ranking, Rating and Reasoning (Chen et al., EMNLP 2022)
- PDF:
- https://preview.aclanthology.org/ingest-acl-2023-videos/2022.emnlp-main.114.pdf