AvarLab: An Integrated Digital Ecosystem for Avar, a Morphologically Rich Low-Resource Language

Kebed Zagidov, Thomas Brochhagen


Abstract
This paper presents a digital ecosystem designed for Avar, a morphologically rich and vulnerable Northeast Caucasian language. Addressing the common bottleneck where lexical resources, corpora, and computational tools are developed in isolation or are entirely absent, we propose the "generate-verify" workflow. By developing a scalable, rule-based computational architecture, our system specifically targets the challenges of low-resource settings, overcoming data sparsity to generate over one million inflected forms from a static dictionary of 14,700 entries.Furthermore, by coupling morphological generation with corpus verification, we introduce a dynamic method to rapidly analyze and expand endangered language data. This approach transforms static linguistic documentation into active language reclamation tools, supporting dictionary lookup and the creation of silver-standard annotations for downstream NLP. The platform also serves as a unified model for the collection, management, and mobilization of fragmented language data, ensuring that the resulting resources are directly accessible and beneficial to the speaker community. Ultimately, AvarLab provides a practical, adaptable pathway for building sustainable digital infrastructure by fostering interaction among documentary linguists, computer scientists, and native speakers.
Anthology ID:
2026.computel-1.7
Volume:
Proceedings of the Ninth Workshop on the Use of Computational Methods in the Study of Endangered Languages (ComputEL-9)
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Godfred Agyapong, Sarah Moeller, Antti Arppe, Ali Marashian, Daisy Rosenblum
Venues:
ComputEL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
62–71
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.computel-1.7/
DOI:
Bibkey:
Cite (ACL):
Kebed Zagidov and Thomas Brochhagen. 2026. AvarLab: An Integrated Digital Ecosystem for Avar, a Morphologically Rich Low-Resource Language. In Proceedings of the Ninth Workshop on the Use of Computational Methods in the Study of Endangered Languages (ComputEL-9), pages 62–71, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
AvarLab: An Integrated Digital Ecosystem for Avar, a Morphologically Rich Low-Resource Language (Zagidov & Brochhagen, ComputEL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.computel-1.7.pdf
Supplementarymaterial:
 2026.computel-1.7.SupplementaryMaterial.txt