From Semi-Digital Edition to Historical NLP Resource:Constructing and Annotating Historical Multilingual Parallel Text Collections on the TEITOK Platform

Maarten Janssen, Anna Jouravel, Piroska Lendvai


Abstract
We construct a multilingual, parallelized digital collection comprising a reconstructed Old Greek text from the 4th century CE and its seven historical versions, modern editions, and translations. We describe the workflow and integrated tools on the TEITOK web-based platform for ingesting, aligning, parallelizing and morphosyntactically annotating these materials. Textual alignment is performed on both the sentence and word level, after which the data are annotated with dependency parses in the Universal Dependencies paradigm. The newly created and manually post-corrected collection can be explored via advanced parallel search functionalities and flexible visualization modes. This workflow is meant to provide support for digital humanities and historical NLP projects via transforming the input texts into parallel NLP resources, enabling cross-fertilization and new insights by multiple research communities.
Anthology ID:
2026.lrec-main.120
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
1553–1561
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.120/
DOI:
Bibkey:
Cite (ACL):
Maarten Janssen, Anna Jouravel, and Piroska Lendvai. 2026. From Semi-Digital Edition to Historical NLP Resource:Constructing and Annotating Historical Multilingual Parallel Text Collections on the TEITOK Platform. International Conference on Language Resources and Evaluation, main:1553–1561.
Cite (Informal):
From Semi-Digital Edition to Historical NLP Resource:Constructing and Annotating Historical Multilingual Parallel Text Collections on the TEITOK Platform (Janssen et al., LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.120.pdf