YodiV3: NLP for Togolese Languages with Eyaa-Tom Dataset and the Lom Metric

Bakoubolo Essowe Justin, Kodjo François Xegbe, Catherine Nana Nyaah Essuman, Afola Kossi Mawouéna Samuel


Abstract
Most of the 40+ languages spoken in Togo are severely under-represented in Natural Language Processing (NLP) resources. We present YodiV3, a comprehensive approach to developing NLP for ten Togolese languages (plus two major lingua francas) covering machine translation, speech recognition, text-to-speech, and language identification. We introduce Eyaa-Tom, a new multi-domain parallel corpus (religious, healthcare, financial, etc.) for these languages. We also propose the Lom metric, a scoring framework to quantify the AI-readiness of each language in terms of available resources. Our experiments demonstrate that leveraging large pretrained models (e.g.NLLB for translation, MMS for speech) with YodiV3 leads to significant improvements in low-resource translation and speech tasks. This work highlights the impact of integrating diverse data sources and pretrained models to bootstrap NLP for under-served languages, and outlines future steps for expanding coverage and capability.
Anthology ID:
2025.africanlp-1.20
Volume:
Proceedings of the Sixth Workshop on African Natural Language Processing (AfricaNLP 2025)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Constantine Lignos, Idris Abdulmumin, David Adelani
Venues:
AfricaNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
143–149
Language:
URL:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.africanlp-1.20/
DOI:
Bibkey:
Cite (ACL):
Bakoubolo Essowe Justin, Kodjo François Xegbe, Catherine Nana Nyaah Essuman, and Afola Kossi Mawouéna Samuel. 2025. YodiV3: NLP for Togolese Languages with Eyaa-Tom Dataset and the Lom Metric. In Proceedings of the Sixth Workshop on African Natural Language Processing (AfricaNLP 2025), pages 143–149, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
YodiV3: NLP for Togolese Languages with Eyaa-Tom Dataset and the Lom Metric (Justin et al., AfricaNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.africanlp-1.20.pdf