Frederik Arnold
2025
Advances and Challenges in the Automatic Identification of Indirect Quotations in Scholarly Texts and Literary Works
Frederik Arnold
|
Robert Jäschke
|
Philip Kraut
Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities
Literary scholars commonly refer to the interpreted literary work using various types of quotations. Two main categories are direct and indirect quotations. In this work we focus on the automatic identification of two subtypes of indirect quotations: paraphrases and summaries. Our contributions are twofold. First, we present a dataset of scholarly works with annotations of text spans which summarize or paraphrase the interpreted drama and the source of the quotation. Second, we present a two-step approach to solve the task at hand. We found the process of annotating large training corpora very time consuming and therefore leverage GPT-generated summaries to generate training data for our approach.
2021
Lotte and Annette: A Framework for Finding and Exploring Key Passages in Literary Works
Frederik Arnold
|
Robert Jäschke
Proceedings of the Workshop on Natural Language Processing for Digital Humanities
We present an approach that leverages expert knowledge contained in scholarly works to automatically identify key passages in literary works. Specifically, we extend a text reuse detection method for finding quotations, such that our system Lotte can deal with common properties of quotations, for example, ellipses or inaccurate quotations. An evaluation shows that Lotte outperforms four existing approaches. To generate key passages, we combine overlapping quotations from multiple scholarly texts. An interactive website, called Annette, for visualizing and exploring key passages makes the results accessible and explorable.