Kebed Zagidov

2026

AvarLab: An Integrated Digital Ecosystem for Avar, a Morphologically Rich Low-Resource Language
Kebed Zagidov | Thomas Brochhagen
Proceedings of the Ninth Workshop on the Use of Computational Methods in the Study of Endangered Languages (ComputEL-9)

This paper presents a digital ecosystem designed for Avar, a morphologically rich and vulnerable Northeast Caucasian language. Addressing the common bottleneck where lexical resources, corpora, and computational tools are developed in isolation or are entirely absent, we propose the "generate-verify" workflow. By developing a scalable, rule-based computational architecture, our system specifically targets the challenges of low-resource settings, overcoming data sparsity to generate over one million inflected forms from a static dictionary of 14,700 entries.Furthermore, by coupling morphological generation with corpus verification, we introduce a dynamic method to rapidly analyze and expand endangered language data. This approach transforms static linguistic documentation into active language reclamation tools, supporting dictionary lookup and the creation of silver-standard annotations for downstream NLP. The platform also serves as a unified model for the collection, management, and mobilization of fragmented language data, ensuring that the resulting resources are directly accessible and beneficial to the speaker community. Ultimately, AvarLab provides a practical, adaptable pathway for building sustainable digital infrastructure by fostering interaction among documentary linguists, computer scientists, and native speakers.

Co-authors

Thomas Brochhagen 1

Venues

ComputEL1
WS1

Fix author