Frederik Arnold


2025

Literary scholars commonly refer to the interpreted literary work using various types of quotations. Two main categories are direct and indirect quotations. In this work we focus on the automatic identification of two subtypes of indirect quotations: paraphrases and summaries. Our contributions are twofold. First, we present a dataset of scholarly works with annotations of text spans which summarize or paraphrase the interpreted drama and the source of the quotation. Second, we present a two-step approach to solve the task at hand. We found the process of annotating large training corpora very time consuming and therefore leverage GPT-generated summaries to generate training data for our approach.

2021

We present an approach that leverages expert knowledge contained in scholarly works to automatically identify key passages in literary works. Specifically, we extend a text reuse detection method for finding quotations, such that our system Lotte can deal with common properties of quotations, for example, ellipses or inaccurate quotations. An evaluation shows that Lotte outperforms four existing approaches. To generate key passages, we combine overlapping quotations from multiple scholarly texts. An interactive website, called Annette, for visualizing and exploring key passages makes the results accessible and explorable.