Kirstine Nielsen Degn
2025
Annotating and Classifying Direct Speech in Historical Danish and Norwegian Literary Texts
Ali Al-Laith
|
Alexander Conroy
|
Kirstine Nielsen Degn
|
Jens Bjerring-Hansen
|
Daniel Hershcovich
Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025)
Analyzing direct speech in historical literary texts provides insights into character dynamics, narrative style, and discourse patterns. In late 19th century Danish and Norwegian fiction direct speech reflects characters’ social and geographical backgrounds. However, inconsistent typographic conventions in Scandinavian literature complicate computational methods for distinguishing direct speech from other narrative elements. To address this, we introduce an annotated dataset from the MeMo corpus, capturing speech markers and tags in Danish and Norwegian novels. We evaluate pre-trained language models for classifying direct speech, with results showing that a Danish Foundation Model (DFM), trained on extensive Danish data, has the highest performance. Finally, we conduct a classifier-assisted quantitative corpus analysis and find a downward trend in the prevalence of speech over time.
2023
Sentiment Classification of Historical Danish and Norwegian Literary Texts
Ali Al-Laith
|
Kirstine Nielsen Degn
|
Alexander Conroy
|
Bolette Sandford Pedersen
|
Jens Bjerring-Hansen
|
Daniel Hershcovich
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
Sentiment classification is valuable for literary analysis, as sentiment is crucial in literary narratives. It can, for example, be used to investigate a hypothesis in the literary analysis of 19th-century Scandinavian novels that the writing of female authors in this period was characterized by negative sentiment, as this paper shows. In order to enable a data-driven analysis of this hypothesis, we create a manually annotated dataset of sentence-level sentiment annotations for novels from this period and use it to train and evaluate various sentiment classification methods. We find that pre-trained multilingual language models outperform models trained on modern Danish, as well as classifiers based on lexical resources. Finally, in classifier-assisted corpus analysis, we confirm the literary hypothesis regarding the author’s gender and further shed light on the temporal development of the trend. Our dataset and trained models will be useful for future analysis of historical Danish and Norwegian literary texts.