Unveiling the Essence of Poetry: Introducing a Comprehensive Dataset and Benchmark for Poem Summarization

Ridwan Mahbub, Ifrad Khan, Samiha Anuva, Md Shihab Shahriar, Md Tahmid Rahman Laskar, Sabbir Ahmed


Abstract
While research in natural language processing has progressed significantly in creative language generation, the question of whether language models can interpret the intended meaning of creative language largely remains unanswered. Poetry as a creative art form has existed for generations, and summarization of such content requires deciphering the figurative patterns to find out the actual intent and message of the poet. This task can provide the researchers an opportunity to evaluate the creative language interpretation capacity of the language models. Unlike typical text, summarization of poems is a challenging task as poems carry a deeper meaning, which can be easily lost if only the literal meaning is considered. That being said, we propose a new task in the field of natural language understanding called ‘Poem Summarization’. As a starting, we propose the first-ever dataset for this task, named ‘PoemSum’, consisting of 3011 samples of poetry and its corresponding summarized interpretation in the English language. We have benchmarked the performance of different state-of-the-art summarization models and provided observations on their limitations. The dataset and all relevant code used in this work have been made publicly available.
Anthology ID:
2023.emnlp-main.920
Volume:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
14878–14886
Language:
URL:
https://aclanthology.org/2023.emnlp-main.920
DOI:
10.18653/v1/2023.emnlp-main.920
Bibkey:
Cite (ACL):
Ridwan Mahbub, Ifrad Khan, Samiha Anuva, Md Shihab Shahriar, Md Tahmid Rahman Laskar, and Sabbir Ahmed. 2023. Unveiling the Essence of Poetry: Introducing a Comprehensive Dataset and Benchmark for Poem Summarization. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 14878–14886, Singapore. Association for Computational Linguistics.
Cite (Informal):
Unveiling the Essence of Poetry: Introducing a Comprehensive Dataset and Benchmark for Poem Summarization (Mahbub et al., EMNLP 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-2024-clasp/2023.emnlp-main.920.pdf
Video:
 https://preview.aclanthology.org/ingest-2024-clasp/2023.emnlp-main.920.mp4