Lost in Translation? Exploring the Shift in Grammatical Gender from Latin to Occitan

Ahan Chatterjee, Matthias Schöffel, Matthias Aßenmacher, Marinus Wiedner, Esteban Garces Arias


Abstract
The diachronic evolution from Latin to the Romance languages involved a restructuring of the grammatical gender system from a tripartite configuration (masculine, feminine, neuter) to a bipartite one (masculine, feminine). In this work, we introduce an interpretable deep learning framework to investigate this phenomenon at both lexical and contextual levels. First, we show that conventional tokenization strategies are insufficiently robust for this low-resource historical setting, and that our proposed tokenizer improves performance over these baselines. At the lexical level, we evaluate the contribution of morphological features to gender prediction. At the contextual level, we quantify the contributions of different part-of-speech categories to grammatical gender prediction. Together, these analyses characterize the distribution of gender information between the lemma and its sentential context. We make our codebase, datasets, and results publicly available.
Anthology ID:
2026.nlp4dh-1.26
Volume:
Proceedings of the 6th International Conference on Natural Language Processing for the Digital Humanities
Month:
July
Year:
2026
Address:
San Diego, USA
Editors:
Sil Hamilton, Emily Öhman, Rebecca M. M. Hicke, Yuri Bizzoni, Axel Bax, Jacob A. Matthews, Mika Hämäläinen
Venues:
NLP4DH | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
276–296
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.nlp4dh-1.26/
DOI:
Bibkey:
Cite (ACL):
Ahan Chatterjee, Matthias Schöffel, Matthias Aßenmacher, Marinus Wiedner, and Esteban Garces Arias. 2026. Lost in Translation? Exploring the Shift in Grammatical Gender from Latin to Occitan. In Proceedings of the 6th International Conference on Natural Language Processing for the Digital Humanities, pages 276–296, San Diego, USA. Association for Computational Linguistics.
Cite (Informal):
Lost in Translation? Exploring the Shift in Grammatical Gender from Latin to Occitan (Chatterjee et al., NLP4DH 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.nlp4dh-1.26.pdf