Evaluation of Generated Poetry

David Mareček, Kateřina Motalík Hodková, Tomáš Musil, Rudolf Rosa


Abstract
We propose a range of automated metrics for evaluation of generated poetry.The metrics measure various aspects of poetry: rhyming, metre, syntax, semantics, and amount of unknown words.In a case study, we implement the metrics for Czech language, apply them to poetry generated by several automated systems as well as human-written, and correlate them with human judgment.We find that most of the proposed metrics correlate well with corresponding human evaluation, but semantically oriented metrics are much better predictors of the overall impression than metrics evaluating formal properties.
Anthology ID:
2025.eval4nlp-1.9
Volume:
Proceedings of the 5th Workshop on Evaluation and Comparison of NLP Systems
Month:
December
Year:
2025
Address:
Mumbai, India
Editors:
Mousumi Akter, Tahiya Chowdhury, Steffen Eger, Christoph Leiter, Juri Opitz, Erion Çano
Venues:
Eval4NLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
109–118
Language:
URL:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.eval4nlp-1.9/
DOI:
Bibkey:
Cite (ACL):
David Mareček, Kateřina Motalík Hodková, Tomáš Musil, and Rudolf Rosa. 2025. Evaluation of Generated Poetry. In Proceedings of the 5th Workshop on Evaluation and Comparison of NLP Systems, pages 109–118, Mumbai, India. Association for Computational Linguistics.
Cite (Informal):
Evaluation of Generated Poetry (Mareček et al., Eval4NLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.eval4nlp-1.9.pdf