Unveiling the Essence of Poetry: Introducing a Comprehensive Dataset and Benchmark for Poem Summarization
Ridwan Mahbub, Ifrad Khan, Samiha Anuva, Md Shihab Shahriar, Md Tahmid Rahman Laskar, Sabbir Ahmed
Abstract
While research in natural language processing has progressed significantly in creative language generation, the question of whether language models can interpret the intended meaning of creative language largely remains unanswered. Poetry as a creative art form has existed for generations, and summarization of such content requires deciphering the figurative patterns to find out the actual intent and message of the poet. This task can provide the researchers an opportunity to evaluate the creative language interpretation capacity of the language models. Unlike typical text, summarization of poems is a challenging task as poems carry a deeper meaning, which can be easily lost if only the literal meaning is considered. That being said, we propose a new task in the field of natural language understanding called ‘Poem Summarization’. As a starting, we propose the first-ever dataset for this task, named ‘PoemSum’, consisting of 3011 samples of poetry and its corresponding summarized interpretation in the English language. We have benchmarked the performance of different state-of-the-art summarization models and provided observations on their limitations. The dataset and all relevant code used in this work have been made publicly available.- Anthology ID:
- 2023.emnlp-main.920
- Volume:
- Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Houda Bouamor, Juan Pino, Kalika Bali
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 14878–14886
- Language:
- URL:
- https://aclanthology.org/2023.emnlp-main.920
- DOI:
- 10.18653/v1/2023.emnlp-main.920
- Cite (ACL):
- Ridwan Mahbub, Ifrad Khan, Samiha Anuva, Md Shihab Shahriar, Md Tahmid Rahman Laskar, and Sabbir Ahmed. 2023. Unveiling the Essence of Poetry: Introducing a Comprehensive Dataset and Benchmark for Poem Summarization. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 14878–14886, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- Unveiling the Essence of Poetry: Introducing a Comprehensive Dataset and Benchmark for Poem Summarization (Mahbub et al., EMNLP 2023)
- PDF:
- https://preview.aclanthology.org/ingest-2024-clasp/2023.emnlp-main.920.pdf