Twenty’s Plenty: Semantic Scaffolding and Span Architecture for 19-Label NER in Medieval Latin Charters

Tamás Kovács, Giuseppe Consolo, Georg Vogeler


Abstract
This study investigates whether a high-quality, 19-label named entity recogniser for medieval Latin charters can be constructed using only a few hundred annotated sentences. The authors introduce "semantic scaffolding," an innovation that utilizes richly descriptive English label phrases as prompts to activate latent multilingual knowledge within the model. This is paired with a custom span-based architecture utilizing XLM-ROBERTa-large, 4-head attention pooling to handle long property descriptions, and a hybrid loss system including Asymmetric Focal-Dice and InfoNCE contrastive terms. Results demonstrate that semantic scaffolding enables fine-tuned GLiNER to reach 80.8% overlap F1, while the custom architecture achieves 83.4% overlap F1 using only 298 training sentences. Significantly, the paper provides an empirical demonstration that domain-specific pre-training on medieval Latin offers no performance advantage once task-specific fine-tuning is applied. While the model excels at frequent categories like PER (95.7% F1) and LOC (93.5% F1), challenges persist for rare, position-dependent legal categories such as LEG (53.1% F1) and TRANS (52.6% F1).
Anthology ID:
2026.nlp4dh-1.22
Volume:
Proceedings of the 6th International Conference on Natural Language Processing for the Digital Humanities
Month:
July
Year:
2026
Address:
San Diego, USA
Editors:
Sil Hamilton, Emily Öhman, Rebecca M. M. Hicke, Yuri Bizzoni, Axel Bax, Jacob A. Matthews, Mika Hämäläinen
Venues:
NLP4DH | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
236–241
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.nlp4dh-1.22/
DOI:
Bibkey:
Cite (ACL):
Tamás Kovács, Giuseppe Consolo, and Georg Vogeler. 2026. Twenty’s Plenty: Semantic Scaffolding and Span Architecture for 19-Label NER in Medieval Latin Charters. In Proceedings of the 6th International Conference on Natural Language Processing for the Digital Humanities, pages 236–241, San Diego, USA. Association for Computational Linguistics.
Cite (Informal):
Twenty’s Plenty: Semantic Scaffolding and Span Architecture for 19-Label NER in Medieval Latin Charters (Kovács et al., NLP4DH 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.nlp4dh-1.22.pdf