Towards Better Open-Ended Text Generation: A Multicriteria Evaluation Framework

Esteban Garces Arias; Hannah Blocher; Julian Rodemann; Meimingwei Li; Christian Heumann; Matthias Aßenmacher

Towards Better Open-Ended Text Generation: A Multicriteria Evaluation Framework

Esteban Garces Arias, Hannah Blocher, Julian Rodemann, Meimingwei Li, Christian Heumann, Matthias Aßenmacher

Abstract

Open-ended text generation has become a prominent task in natural language processing due to the rise of powerful (large) language models. However, evaluating the quality of these models and the employed decoding strategies remains challenging due to trade-offs among widely used metrics such as coherence, diversity, and perplexity. This paper addresses the specific problem of multicriteria evaluation for open-ended text generation, proposing novel methods for both relative and absolute rankings of decoding methods. Specifically, we employ benchmarking approaches based on partial orderings and present a new summary metric to balance existing automatic indicators, providing a more holistic evaluation of text generation quality. Our experiments demonstrate that the proposed approaches offer a robust way to compare decoding strategies and serve as valuable tools to guide model selection for open-ended text generation tasks. We suggest future directions for improving evaluation methodologies in text generation and make our code, datasets, and models publicly available.

Anthology ID:: 2025.gem-1.59
Volume:: Proceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM²)
Month:: July
Year:: 2025
Address:: Vienna, Austria and virtual meeting
Editors:: Kaustubh Dhole, Miruna Clinciu
Venues:: GEM | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 631–654
Language:
URL:: https://preview.aclanthology.org/corrections-2025-08/2025.gem-1.59/
DOI:
Bibkey:
Cite (ACL):: Esteban Garces Arias, Hannah Blocher, Julian Rodemann, Meimingwei Li, Christian Heumann, and Matthias Aßenmacher. 2025. Towards Better Open-Ended Text Generation: A Multicriteria Evaluation Framework. In Proceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM²), pages 631–654, Vienna, Austria and virtual meeting. Association for Computational Linguistics.
Cite (Informal):: Towards Better Open-Ended Text Generation: A Multicriteria Evaluation Framework (Garces Arias et al., GEM 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/corrections-2025-08/2025.gem-1.59.pdf

PDF Cite Search Fix data