Advances and Challenges in the Automatic Identification of Indirect Quotations in Scholarly Texts and Literary Works

Frederik Arnold, Robert Jäschke, Philip Kraut


Abstract
Literary scholars commonly refer to the interpreted literary work using various types of quotations. Two main categories are direct and indirect quotations. In this work we focus on the automatic identification of two subtypes of indirect quotations: paraphrases and summaries. Our contributions are twofold. First, we present a dataset of scholarly works with annotations of text spans which summarize or paraphrase the interpreted drama and the source of the quotation. Second, we present a two-step approach to solve the task at hand. We found the process of annotating large training corpora very time consuming and therefore leverage GPT-generated summaries to generate training data for our approach.
Anthology ID:
2025.nlp4dh-1.15
Volume:
Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities
Month:
May
Year:
2025
Address:
Albuquerque, USA
Editors:
Mika Hämäläinen, Emily Öhman, Yuri Bizzoni, So Miyagawa, Khalid Alnajjar
Venues:
NLP4DH | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
179–190
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.nlp4dh-1.15/
DOI:
Bibkey:
Cite (ACL):
Frederik Arnold, Robert Jäschke, and Philip Kraut. 2025. Advances and Challenges in the Automatic Identification of Indirect Quotations in Scholarly Texts and Literary Works. In Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities, pages 179–190, Albuquerque, USA. Association for Computational Linguistics.
Cite (Informal):
Advances and Challenges in the Automatic Identification of Indirect Quotations in Scholarly Texts and Literary Works (Arnold et al., NLP4DH 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.nlp4dh-1.15.pdf