Giuseppe Consolo
2026
Twenty’s Plenty: Semantic Scaffolding and Span Architecture for 19-Label NER in Medieval Latin Charters
Tamás Kovács | Giuseppe Consolo | Georg Vogeler
Proceedings of the 6th International Conference on Natural Language Processing for the Digital Humanities
Tamás Kovács | Giuseppe Consolo | Georg Vogeler
Proceedings of the 6th International Conference on Natural Language Processing for the Digital Humanities
This study investigates whether a high-quality, 19-label named entity recogniser for medieval Latin charters can be constructed using only a few hundred annotated sentences. The authors introduce "semantic scaffolding," an innovation that utilizes richly descriptive English label phrases as prompts to activate latent multilingual knowledge within the model. This is paired with a custom span-based architecture utilizing XLM-ROBERTa-large, 4-head attention pooling to handle long property descriptions, and a hybrid loss system including Asymmetric Focal-Dice and InfoNCE contrastive terms. Results demonstrate that semantic scaffolding enables fine-tuned GLiNER to reach 80.8% overlap F1, while the custom architecture achieves 83.4% overlap F1 using only 298 training sentences. Significantly, the paper provides an empirical demonstration that domain-specific pre-training on medieval Latin offers no performance advantage once task-specific fine-tuning is applied. While the model excels at frequent categories like PER (95.7% F1) and LOC (93.5% F1), challenges persist for rare, position-dependent legal categories such as LEG (53.1% F1) and TRANS (52.6% F1).