LFED: A Literary Fiction Evaluation Dataset for Large Language Models

Linhao Yu; Qun Liu; Deyi Xiong

LFED: A Literary Fiction Evaluation Dataset for Large Language Models

Abstract

The rapid evolution of large language models (LLMs) has ushered in the need for comprehensive assessments of their performance across various dimensions. In this paper, we propose LFED, a Literary Fiction Evaluation Dataset, which aims to evaluate the capability of LLMs on the long fiction comprehension and reasoning. We collect 95 literary fictions that are either originally written in Chinese or translated into Chinese, covering a wide range of topics across several centuries. We define a question taxonomy with 8 question categories to guide the creation of 1,304 questions. Additionally, we conduct an in-depth analysis to ascertain how specific attributes of literary fictions (e.g., novel types, character numbers, the year of publication) impact LLM performance in evaluations. Through a series of experiments involving various state-of-the-art LLMs, our findings reveal that these models face considerable challenges in effectively addressing questions related to literary fictions, with ChatGPT reaching only 57.08% under the zero-shot setting. The dataset will be publicly available at https://github.com/tjunlp-lab/LFED.git.

Anthology ID:: 2024.lrec-main.915
Volume:: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:: May
Year:: 2024
Address:: Torino, Italia
Editors:: Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:: LREC | COLING
SIG:
Publisher:: ELRA and ICCL
Note:
Pages:: 10466–10475
Language:
URL:: https://aclanthology.org/2024.lrec-main.915
DOI:
Bibkey:
Cite (ACL):: Linhao Yu, Qun Liu, and Deyi Xiong. 2024. LFED: A Literary Fiction Evaluation Dataset for Large Language Models. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 10466–10475, Torino, Italia. ELRA and ICCL.
Cite (Informal):: LFED: A Literary Fiction Evaluation Dataset for Large Language Models (Yu et al., LREC-COLING 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/nschneid-patch-4/2024.lrec-main.915.pdf

PDF Search