Sebastian Reichbauer

2026

Evaluating Latin and Ancient Greek Sentence Alignment through Parallel Sentence Mining
Sebastian Reichbauer | Shu Okabe | Alexander Fraser
Proceedings of the 6th International Conference on Natural Language Processing for the Digital Humanities

Cross-lingual detection of intertextuality and translation in Latin and Ancient Greek through computational approaches is of great interest for classical studies.While several systems exist for parallel sentence detection, based on general multilingual or specific models for Latin–Ancient Greek, they have not been compared against each other. Therefore, we present a synthetic benchmark to evaluate the performance of language models regarding cross-lingual Ancient Greek and Latin parallel sentence mining. We first compare six language models to encode sentences and then further improve the cross-lingual alignment through post-processing, fine-tuning, and knowledge distillation. We find that the whitening transformation in combination with knowledge distillation provides excellent results. Specifically, SPhilBERTa, a trilingual language model for Ancient Greek and Latin, benefits the most from the improvements and achieves a substantial mining score of 97.6 on our benchmark.

Co-authors

Alexander Fraser 1
Shu Okabe 1

Venues

NLP4DH1
WS1

Fix author