Veronika Urban
2026
Cross-Linguistic Situation Entity Segmentation for Discourse Analysis in Diachronic English and German Text
Hanna Schmück | Veronika Urban | Xaver Krückl | Sonja Zeman | Claudia Claridge | Annemarie Friedrich
Proceedings of the 20th Linguistic Annotation Workshop (LAW XX)
Hanna Schmück | Veronika Urban | Xaver Krückl | Sonja Zeman | Claudia Claridge | Annemarie Friedrich
Proceedings of the 20th Linguistic Annotation Workshop (LAW XX)
Situation Entity (SE) segmentation identifies clause-like discourse units focusing on verb constellations. While SE segmentation has been applied to contemporary English as a subtask of SE annotation, systematic guidelines for syntactically ambiguous constructions remain underspecified. We present principled SE segmentation guidelines for contemporary and historical varieties of English and German. Our inter-annotator agreement studies on Late Modern English (1700–1900) and New High German (1650–1900) corpora demonstrate substantial agreement. Using the existing SitEnt corpus in contemporary English, we implement a new automatic segmenter based on XLM-RoBERTa. Our evaluation examines cross-variety and cross-lingual generalization, demonstrating challenges both for human annotation efforts and in transferring segmenters trained on contemporary English to historical varieties. Our code and data are publicly available at https://github.com/coling-unia/sitent-segmenter-law2026.