Sophie Jasmin Spliethoff

2026

Prompting Across Time: Evaluating LLMs on Historical and Contemporary Offensive Language
Sanne Hoeken | Sophie Jasmin Spliethoff | Silke Schwandt | \"Ozge Alacam | Sina Zarrie{\ss}
Findings of the Association for Computational Linguistics: ACL 2026

Research on hate speech detection (HSD) has centered on modern data, even though offensive language has a much longer history. This paper presents the first systematic evaluation of instruction-tuned LLMs on Early Modern English invectives, compared with a modern hate-speech benchmark. Our work applies a modular prompt design to measure the contribution of definitional richness, contextual grounding, decision rules and few-shot examples. The results indicate that clearer annotation boundaries in the curated historical corpus lead to higher classification performance compared to the modern benchmark, despite the disadvantage of linguistic unfamiliarity. Prompt brittleness, however, persists across both domains. Classification-oriented components (rules, examples) drive the strongest effects, while definitional or contextual additions matter less. Fine-tuned encoder models still outperform LLMs, but some prompt configurations can narrow the gap. Overall, our study provides practical guidance for prompt design in both digital humanities and HSD and new opportunities for tracing the historical development of hate speech.

Co-authors

Venues

Findings1

Fix author