Ife Adebara

2025

pdf bib abs
Where Are We? Evaluating LLM Performance on African Languages
Ife Adebara | Hawau Olamide Toyin | Nahom Tesfu Ghebremichael | AbdelRahim A. Elmadany | Muhammad Abdul-Mageed
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Africa’s rich linguistic heritage remains underrepresented in NLP, largely due to historical policies that favor foreign languages and create significant data inequities. In this paper, we integrate theoretical insights on Africa’s language landscape with an empirical evaluation using Sahara— a comprehensive benchmark curated from large-scale, publicly accessible datasets capturing the continent’s linguistic diversity. By systematically assessing the performance of leading large language models (LLMs) on Sahara, we demonstrate how policy-induced data variations directly impact model effectiveness across African languages. Our findings reveal that while a few languages perform reasonably well, many Indigenous languages remain marginalized due to sparse data. Leveraging these insights, we offer actionable recommendations for policy reforms and inclusive data practices. Overall, our work underscores the urgent need for a dual approach—combining theoretical understanding with empirical evaluation—to foster linguistic diversity in AI for African communities.

pdf bib abs
Retrieval-Augmented Generation Meets Local Languages for Improved Drug Information Access and Comprehension.
Ahmad Ibrahim Ismail | Bashirudeen Opeyemi Ibrahim | Olubayo Adekanmbi | Ife Adebara
Proceedings of the Sixth Workshop on African Natural Language Processing (AfricaNLP 2025)

Medication errors are among the leading causes of avoidable harm in healthcare systems across the world. A large portion of these errors stem from inefficient information retrieval processes and lack of comprehension of drug information. In low-resource settings, these issues are exacerbated by limited access to updated and reliable sources, technological constraints, and linguistic barriers. Innovations to improve the retrieval and comprehension of drug-related information are therefore poised to reduce medication errors and improve patient outcomes. This research employed open-source Retrieval-Augmented Generation (RAG) integrated with multilingual translation and Text-to-Speech (TTS) systems. Using open-source tools, a corpus was created from prominent sources of medical information in Nigeria and stored as high-level text embeddings in a Chroma database. Upon user query, relevant drug information is retrieved and synthesized using a large language model. This can be translated into Yoruba, Igbo, and Hausa languages, and converted into speech through the TTS system, addressing the linguistic accessibility gap. Evaluation of the system by domain experts indicated impressive overall performance in translation, achieving an average accuracy of 73%, and the best performance observed in Hausa and Yoruba. TTS results were moderately effective (mean = 57%), with Igbo scoring highest in speech clarity (68%). However, tonal complexity, especially in Yoruba, posed challenges for accurate pronunciation, highlighting the need for language-specific model fine-tuning. Addressing these linguistic nuances is essential to optimize comprehension and practical utility in diverse healthcare settings. The results demonstrates systems the potential to improve access to drug information, enhance comprehension, and reduce linguistic barriers. These technologies could substantially mitigate medication errors and improve patient safety. This study offers valuable insights and practical guidelines for future implementations aimed at strengthening global medication safety practices.

pdf bib abs
Beyond Generalization :Evaluating Multilingual LLMs for Yorùbá Animal Health Translation
Godwin Adegbehingbe | Anthony Soronnadi | Ife Adebara | Olubayo Adekanmbi
Proceedings of the Sixth Workshop on African Natural Language Processing (AfricaNLP 2025)

Machine translation (MT) has advanced significantly for high-resource languages, yet specialized domain translation remains a challenge for low-resource languages. This study evaluates the ability of state-of-the-art multilingual models to translate animal health reports from English to Yorùbá, a crucial task for veterinary communication in underserved regions. We curated a dataset of 1,468 parallel sentences and compared multiple MT models in zero-shot and fine-tuned settings. Our findings indicate substantial limitations in their ability to generalize to domain-specific translation, with common errors arising from vocabulary mismatch, training data scarcity, and morphological complexity. Fine-tuning improves performance, particularly for the NLLB 3.3B model, but challenges remain in preserving technical accuracy. These results underscore the need for more targeted approaches to multilingual and culturally aware LLMs for African languages.

pdf bib abs
Y-NQ: English-Yorùbá Evaluation dataset for Open-Book Reading Comprehension with Open-Ended Questions
Marta R. Costa-jussà | Joy Chen | Ife Adebara | Joe Chuang | Christophe Ropers | Eduardo Sánchez
Proceedings of the Sixth Workshop on African Natural Language Processing (AfricaNLP 2025)

The purpose of this work is to share an English-Yorùbá evaluation dataset for openbook reading comprehension with open-ended questions to assess the performance of models both in a high- and a low-resource language. The dataset contains 358 questions and answers on 338 English documents and 208 Yorùbá documents. Experiments show a consistent disparity in performance between the two languages, with Yorùbá falling behind English for automatic metrics even if documents are much shorter for this language. For a small set of documents with comparable length, performance of Yorùbá drops by 2.5 times and this comparison is validated with humanevaluation. When analyzing performance by length, we observe that Yorùbá decreases performance dramatically for documents that reach 1500 words while English performance is barely affected at that length. Our dataset opens the door to showcasing if English LLM reading comprehension capabilities extend to Yorùbá, which for the evaluated LLMs is not the case.

2024

pdf bib abs
Cheetah: Natural Language Generation for 517 African Languages
Ife Adebara | AbdelRahim Elmadany | Muhammad Abdul-Mageed
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Low-resource African languages pose unique challenges for natural language processing (NLP) tasks, including natural language generation (NLG). In this paper, we develop Cheetah, a massively multilingual NLG language model for African languages. Cheetah supports 517 African languages and language varieties, allowing us to address the scarcity of NLG resources and provide a solution to foster linguistic diversity. We demonstrate the effectiveness of Cheetah through comprehensive evaluations across six generation downstream tasks. In five of the six tasks, Cheetah significantly outperforms other models, showcasing its remarkable performance for generating coherent and contextually appropriate text in a wide range of African languages. We additionally conduct a detailed human evaluation to delve deeper into the linguistic capabilities of Cheetah. The findings of this study contribute to advancing NLP research in low-resource settings, enabling greater accessibility and inclusion for African languages in a rapidly expanding digital landscape. We will publicly release our models for research.

pdf bib abs
Fumbling in Babel: An Investigation into ChatGPT’s Language Identification Ability
Wei-Rui Chen | Ife Adebara | Khai Doan | Qisheng Liao | Muhammad Abdul-Mageed
Findings of the Association for Computational Linguistics: NAACL 2024

ChatGPT has recently emerged as a powerful NLP tool that can carry out a variety of tasks. However, the range of languages ChatGPT can handle remains largely a mystery. To uncover which languages ChatGPT ‘knows’, we investigate its language identification (LID) abilities. For this purpose, we compile Babel-670, a benchmark comprising 670 languages representing 23 language families spoken in five continents. Languages in Babel-670 run the gamut from the very high-resource to the very low-resource. We then study ChatGPT’s (both GPT-3.5 and GPT-4) ability to (i) identify language names and language codes (ii) under zero- and few-shot conditions (iii) with and without provision of a label set. When compared to smaller finetuned LID tools, we find that ChatGPT lags behind. For example, it has poor performance on African languages. We conclude that current large language models would benefit from further development before they can sufficiently serve diverse communities.

pdf bib abs
Toucan: Many-to-Many Translation for 150 African Language Pairs
AbdelRahim Elmadany | Ife Adebara | Muhammad Abdul-Mageed
Findings of the Association for Computational Linguistics: ACL 2024

We address a notable gap in Natural Language Processing (NLP) by introducing a collection of resources designed to improve Machine Translation (MT) for low-resource languages, with a specific focus on African languages. First, We introduce two language models (LMs), Cheetah-1.2B and Cheetah-3.7B, with 1.2 billion and 3.7 billion parameters respectively. Next, we finetune the aforementioned models to create Toucan, an Afrocentric machine translation model designed to support 156 African language pairs. To evaluate Toucan, we carefully develop an extensive machine translation benchmark, dubbed Afro-Lingu-MT, tailored for evaluating machine translation. Toucan significantly outperforms other models, showcasing its remarkable performance on MT for African languages. Finally, we train a new model, spBLEU-1K, to enhance translation evaluation metrics, covering 1K languages, including African languages. This work aims to advance the field of NLP, fostering cross-cultural understanding and knowledge exchange, particularly in regions with limited language resources such as Africa.

pdf bib abs
Interplay of Machine Translation, Diacritics, and Diacritization
Wei-Rui Chen | Ife Adebara | Muhammad Abdul-Mageed
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

We investigate two research questions: (1) how do machine translation (MT) and diacritization influence the performance of each other in a multi-task learning setting (2) the effect of keeping (vs. removing) diacritics on MT performance. We examine these two questions in both high-resource (HR) and low-resource (LR) settings across 55 different languages (36 African languages and 19 European languages). For (1), results show that diacritization significantly benefits MT in the LR scenario, doubling or even tripling performance for some languages, but harms MT in the HR scenario. We find that MT harms diacritization in LR but benefits significantly in HR for some languages. For (2), MT performance is similar regardless of diacritics being kept or removed. In addition, we propose two classes of metrics to measure the complexity of a diacritical system, finding these metrics to correlate positively with the performance of our diacritization models. Overall, our work provides insights for developing MT and diacritization systems under different data size conditions and may have implications that generalize beyond the 55 languages we investigate.

2023

pdf bib abs
SERENGETI: Massively Multilingual Language Models for Africa
Ife Adebara | AbdelRahim Elmadany | Muhammad Abdul-Mageed | Alcides Alcoba Inciarte
Findings of the Association for Computational Linguistics: ACL 2023

Multilingual pretrained language models (mPLMs) acquire valuable, generalizable linguistic information during pretraining and have advanced the state of the art on task-specific finetuning. To date, only ~31 out of ~2,000 African languages are covered in existing language models. We ameliorate this limitation by developing SERENGETI, a set of massively multilingual language model that covers 517 African languages and language varieties. We evaluate our novel models on eight natural language understanding tasks across 20 datasets, comparing to 4 mPLMs that cover 4-23 African languages. SERENGETI outperforms other models on 11 datasets across the eights tasks, achieving 82.27 average F_1. We also perform analyses of errors from our models, which allows us to investigate the influence of language genealogy and linguistic similarity when the models are applied under zero-shot settings. We will publicly release our models for research. Anonymous link

pdf bib abs
UBC-DLNLP at SemEval-2023 Task 12: Impact of Transfer Learning on African Sentiment Analysis
Gagan Bhatia | Ife Adebara | Abdelrahim Elmadany | Muhammad Abdul-mageed
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

We describe our contribution to the SemEVAl 2023 AfriSenti-SemEval shared task, where we tackle the task of sentiment analysis in 14 different African languages. We develop both monolingual and multilingual models under a full supervised setting (subtasks A and B). We also develop models for the zero-shot setting (subtask C). Our approach involves experimenting with transfer learning using six language models, including further pretraining of some of these models as well as a final finetuning stage. Our best performing models achieve an F1-score of 70.36 on development data and an F1-score of 66.13 on test data. Unsurprisingly, our results demonstrate the effectiveness of transfer learning and finetuning techniques for sentiment analysis across multiple languages. Our approach can be applied to other sentiment analysis tasks in different languages and domains.

2022

pdf bib abs
Towards Afrocentric NLP for African Languages: Where We Are and Where We Can Go
Ife Adebara | Muhammad Abdul-Mageed
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Aligning with ACL 2022 special Theme on “Language Diversity: from Low Resource to Endangered Languages”, we discuss the major linguistic and sociopolitical challenges facing development of NLP technologies for African languages. Situating African languages in a typological framework, we discuss how the particulars of these languages can be harnessed. To facilitate future research, we also highlight current efforts, communities, venues, datasets, and tools. Our main objective is to motivate and advocate for an Afrocentric approach to technology development. With this in mind, we recommend what technologies to build and how to build, evaluate, and deploy them based on the needs of local African communities.

pdf bib abs
Linguistically-Motivated Yorùbá-English Machine Translation
Ife Adebara | Muhammad Abdul-Mageed | Miikka Silfverberg
Proceedings of the 29th International Conference on Computational Linguistics

Translating between languages where certain features are marked morphologically in one but absent or marked contextually in the other is an important test case for machine translation. When translating into English which marks (in)definiteness morphologically, from Yorùbá which uses bare nouns but marks these features contextually, ambiguities arise. In this work, we perform fine-grained analysis on how an SMT system compares with two NMT systems (BiLSTM and Transformer) when translating bare nouns in Yorùbá into English. We investigate how the systems what extent they identify BNs, correctly translate them, and compare with human translation patterns. We also analyze the type of errors each model makes and provide a linguistic description of these errors. We glean insights for evaluating model performance in low-resource settings. In translating bare nouns, our results show the transformer model outperforms the SMT and BiLSTM models for 4 categories, the BiLSTM outperforms the SMT model for 3 categories while the SMT outperforms the NMT models for 1 category.

pdf bib abs
AfroLID: A Neural Language Identification Tool for African Languages
Ife Adebara | AbdelRahim Elmadany | Muhammad Abdul-Mageed | Alcides Inciarte
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Language identification (LID) is a crucial precursor for NLP, especially for mining web data. Problematically, most of the world’s 7000+ languages today are not covered by LID technologies. We address this pressing issue for Africa by introducing AfroLID, a neural LID toolkit for 517 African languages and varieties. AfroLID exploits a multi-domain web dataset manually curated from across 14 language families utilizing five orthographic systems. When evaluated on our blind Test set, AfroLID achieves 95.89 F_1-score. We also compare AfroLID to five existing LID tools that each cover a small number of African languages, finding it to outperform them on most languages. We further show the utility of AfroLID in the wild by testing it on the acutely under-served Twitter domain. Finally, we offer a number of controlled case studies and perform a linguistically-motivated error analysis that allow us to both showcase AfroLID’s powerful capabilities and limitations

2021

pdf bib abs
Improving Similar Language Translation With Transfer Learning
Ife Adebara | Muhammad Abdul-Mageed
Proceedings of the Sixth Conference on Machine Translation

We investigate transfer learning based on pre-trained neural machine translation models to translate between (low-resource) similar languages. This work is part of our contribution to the WMT 2021 Similar Languages Translation Shared Task where we submitted models for different language pairs, including French-Bambara, Spanish-Catalan, and Spanish-Portuguese in both directions. Our models for Catalan-Spanish (82.79 BLEU)and Portuguese-Spanish (87.11 BLEU) rank top 1 in the official shared task evaluation, and we are the only team to submit models for the French-Bambara pairs.

2020

pdf bib abs
Translating Similar Languages: Role of Mutual Intelligibility in Multilingual Transformers
Ife Adebara | El Moatez Billah Nagoudi | Muhammad Abdul Mageed
Proceedings of the Fifth Conference on Machine Translation

In this work we investigate different approaches to translate between similar languages despite low resource limitations. This work is done as the participation of the UBC NLP research group in the WMT 2019 Similar Languages Translation Shared Task. We participated in all language pairs and performed various experiments. We used a transformer architecture for all the models and used back-translation for one of the language pairs. We explore both bilingual and multi-lingual approaches. We describe the pre-processing, training, translation and results for each model. We also investigate the role of mutual intelligibility in model performance.

Venues

wmt2