VerbCraft: Morphologically-Aware Armenian Text Generation Using LLMs in Low-Resource Settings

Hayastan Avetisyan, David Broneske


Abstract
Understanding and generating morphologically complex verb forms is a critical challenge in Natural Language Processing (NLP), particularly for low-resource languages like Armenian. Armenian’s verb morphology encodes multiple layers of grammatical information, such as tense, aspect, mood, voice, person, and number, requiring nuanced computational modeling. We introduce VerbCraft, a novel neural model that integrates explicit morphological classifiers into the mBART-50 architecture. VerbCraft achieves a BLEU score of 0.4899 on test data, compared to the baseline’s 0.9975, reflecting its focus on prioritizing morphological precision over fluency. With over 99% accuracy in aspect and voice predictions and robust performance on rare and irregular verb forms, VerbCraft addresses data scarcity through synthetic data generation with human-in-the-loop validation. Beyond Armenian, it offers a scalable framework for morphologically rich, low-resource languages, paving the way for linguistically informed NLP systems and advancing language preservation efforts.
Anthology ID:
2025.resourceful-1.25
Volume:
Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025)
Month:
March
Year:
2025
Address:
Tallinn, Estonia
Editors:
Špela Arhar Holdt, Nikolai Ilinykh, Barbara Scalvini, Micaella Bruton, Iben Nyholm Debess, Crina Madalina Tudor
Venues:
RESOURCEFUL | WS
SIG:
Publisher:
University of Tartu Library, Estonia
Note:
Pages:
111–119
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.resourceful-1.25/
DOI:
Bibkey:
Cite (ACL):
Hayastan Avetisyan and David Broneske. 2025. VerbCraft: Morphologically-Aware Armenian Text Generation Using LLMs in Low-Resource Settings. In Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025), pages 111–119, Tallinn, Estonia. University of Tartu Library, Estonia.
Cite (Informal):
VerbCraft: Morphologically-Aware Armenian Text Generation Using LLMs in Low-Resource Settings (Avetisyan & Broneske, RESOURCEFUL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.resourceful-1.25.pdf