Fostering Digital Inclusion for Low-Resource Nigerian Languages: A Case Study of Igbo and Nigerian Pidgin

Ebelechukwu Nwafor, Minh Phuc Nguyen


Abstract
Current state-of-the-art large language models (LLMs) like GPT-4 perform exceptionally well in language translation tasks for high-resource languages, such as English, but often lack high accuracy results for low-resource African languages such as Igbo and Nigerian Pidgin, two native languages in Nigeria. This study addresses the need for Artificial Intelligence (AI) linguistic diversity by creating benchmark datasets for Igbo-English and Nigerian Pidgin-English language translation tasks. The dataset developed is curated from reputable online sources and meticulously annotated by crowd-sourced native-speaking human annotators. Using the datasets, we evaluate the translation abilities of GPT-based models alongside other state-of-the-art translation models specifically designed for low-resource languages. Our results demonstrate that current state-of-the-art models outperform GPT-based models in translation tasks. In addition, these datasets can significantly enhance LLM performance in these translation tasks, marking a step toward reducing linguistic bias and promoting more inclusive AI models.
Anthology ID:
2025.loresmt-1.6
Volume:
Proceedings of the Eighth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2025)
Month:
May
Year:
2025
Address:
Albuquerque, New Mexico, U.S.A.
Editors:
Atul Kr. Ojha, Chao-hong Liu, Ekaterina Vylomova, Flammie Pirinen, Jonathan Washington, Nathaniel Oco, Xiaobing Zhao
Venues:
LoResMT | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
44–53
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.loresmt-1.6/
DOI:
Bibkey:
Cite (ACL):
Ebelechukwu Nwafor and Minh Phuc Nguyen. 2025. Fostering Digital Inclusion for Low-Resource Nigerian Languages: A Case Study of Igbo and Nigerian Pidgin. In Proceedings of the Eighth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2025), pages 44–53, Albuquerque, New Mexico, U.S.A.. Association for Computational Linguistics.
Cite (Informal):
Fostering Digital Inclusion for Low-Resource Nigerian Languages: A Case Study of Igbo and Nigerian Pidgin (Nwafor & Nguyen, LoResMT 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.loresmt-1.6.pdf