AfriMMT-EA: Multi-domain Machine Translation for Low-Resource East African Languages

Naome A Etori, Kelechi Ezema, Nathaniel Romney Robinson, Davis David, Alfred Malengo Kondoro, Elisha Ondieki Makori, Michael Samwel Mollel, Maria Gini


Abstract
Despite remarkable progress in multilingual machine translation (MT), the majority of African—especially East African—languages remain significantly underrepresented both in benchmark datasets and state-of-the-art (SOTA) MT models. This persistent exclusion from mainstream technologies not only limits equitable access, but constrains the development of tools that accurately reflect the region’s linguistic and cultural diversity. Recent advances in open-source large language models have demonstrated strong multilingual MT capabilities through data-efficient adaptation strategies. However, little work has explored their potential for low-resource African languages. We introduce AfriMMT-EA, the first highly multilingual benchmark and MT dataset for East African languages. Our datasets comprise 54 local languages across five East African countries. We used these data to fine-tune two multilingual versions of Gemma-3. We compare models’ performance on these languages with larger off-the-shelf baselines. We release our data and models, in the interest of advancing MT for these low-resource languages and their communities.
Anthology ID:
2026.findings-eacl.179
Volume:
Findings of the Association for Computational Linguistics: EACL 2026
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3459–3492
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.179/
DOI:
Bibkey:
Cite (ACL):
Naome A Etori, Kelechi Ezema, Nathaniel Romney Robinson, Davis David, Alfred Malengo Kondoro, Elisha Ondieki Makori, Michael Samwel Mollel, and Maria Gini. 2026. AfriMMT-EA: Multi-domain Machine Translation for Low-Resource East African Languages. In Findings of the Association for Computational Linguistics: EACL 2026, pages 3459–3492, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
AfriMMT-EA: Multi-domain Machine Translation for Low-Resource East African Languages (Etori et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.179.pdf
Checklist:
 2026.findings-eacl.179.checklist.pdf