LLMs as Annotators: Evaluating Model–Human Alignment in Detecting Contentious Language in Historical Corpora

Yahui Zhao; Clemencia Siro; Laura Hollink

LLMs as Annotators: Evaluating Model–Human Alignment in Detecting Contentious Language in Historical Corpora

Yahui Zhao, Clemencia Siro, Laura Hollink

Abstract

Historical texts often contain terminology that reflects outdated or harmful social values. Identifying such contentious terms is essential for the Galleries, Libraries, Archives, and Museums (GLAM) community, but manual annotation requires cultural expertise and is difficult to scale. This study evaluates whether large language models (LLMs) can support this process by aligning with human judgments of contentiousness in historical Dutch corpora. Using the Dutch Contentious Contexts Corpus (ConConCor), we formalize the task as context-dependent binary classification and compare two LLMs across multiple prompt configurations and evaluation scenarios. The models achieve near-human-level agreement on explicit cases but diverge when contextual or historical reasoning is required. Analysis of disagreement patterns shows that LLMs capture overtly harmful expressions yet tend to over-predict contentiousness for identity-related and colonial terms and under-predict for semantically shifted or figurative uses. These findings suggest that LLMs can act as auxiliary annotators for sensitive language detection in historical materials, provided that human oversight and contextual interpretation remain central to annotation workflows.

Anthology ID:: 2026.lrec-main.852
Volume:: Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:: May
Year:: 2026
Address:: Palma de Mallorca, Spain
Editors:: Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:: LREC
SIG:
Publisher:: ELRA Language Resource Association
Note:
Pages:: 10883–10896
Language:
URL:: https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.852/
DOI:
Bibkey:
Cite (ACL):: Yahui Zhao, Clemencia Siro, and Laura Hollink. 2026. LLMs as Annotators: Evaluating Model–Human Alignment in Detecting Contentious Language in Historical Corpora. International Conference on Language Resources and Evaluation, main:10883–10896.
Cite (Informal):: LLMs as Annotators: Evaluating Model–Human Alignment in Detecting Contentious Language in Historical Corpora (Zhao et al., LREC 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.852.pdf

PDF Cite Search Fix data