Philip Kraut


2025

pdf bib
Advances and Challenges in the Automatic Identification of Indirect Quotations in Scholarly Texts and Literary Works
Frederik Arnold | Robert Jäschke | Philip Kraut
Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities

Literary scholars commonly refer to the interpreted literary work using various types of quotations. Two main categories are direct and indirect quotations. In this work we focus on the automatic identification of two subtypes of indirect quotations: paraphrases and summaries. Our contributions are twofold. First, we present a dataset of scholarly works with annotations of text spans which summarize or paraphrase the interpreted drama and the source of the quotation. Second, we present a two-step approach to solve the task at hand. We found the process of annotating large training corpora very time consuming and therefore leverage GPT-generated summaries to generate training data for our approach.