The Megrelian Language Corpus (MLC): Creation, Annotation, and Initial Steps toward a UD Treebank

Irina Lobzhanidze, Rusudan Gersamia, Tamar Gogia


Abstract
This paper presents the development of the Megrelian Language Corpus (MLC), a new language resource for the documentation and computational analysis of Megrelian, an endangered Kartvelian language. The corpus is based on fieldwork conducted in Samegrelo, Georgia (2022–2024) and currently contains 97,691 tokens and 60,959 types. The data were transcribed using the International Phonetic Alphabet (IPA) and annotated in Fieldworks Language Explorer (FLEx) with segmentation, morphological analysis and bilingual Georgian-English translations. Each text is accessible through a specially designed web interface, providing multiple tiers of annotation and integrated search functions. The paper describes the corpus design, annotation methodology and challenges encountered in representing Megrelian’s complex agglutinative morphology. It also outlines initial steps toward converting existing data into the Universal Dependencies (UD) framework, building on experience from related Kartvelian languages such as Georgian. The MLC corpus represents the first publicly available linguistic resource for Megrelian and provides a foundation for future UD treebank development.
Anthology ID:
2026.lrec-main.255
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
3250–3256
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.255/
DOI:
Bibkey:
Cite (ACL):
Irina Lobzhanidze, Rusudan Gersamia, and Tamar Gogia. 2026. The Megrelian Language Corpus (MLC): Creation, Annotation, and Initial Steps toward a UD Treebank. International Conference on Language Resources and Evaluation, main:3250–3256.
Cite (Informal):
The Megrelian Language Corpus (MLC): Creation, Annotation, and Initial Steps toward a UD Treebank (Lobzhanidze et al., LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.255.pdf