Negasi Haile Abadi


2026

Africa is home to over one-third of the world’s languages, yet remains severely underrepresented in multimodal AI research. We introduce Afri-MCQA, the first Multilingual Cultural Question-Answering benchmark containing 7.5k Q A pairs across 15 African languages from 12 countries. The benchmark offers parallel text and speech modalities and was entirely created by native speakers. We find that models show poor performance across evaluated cultures, with near-zero accuracy on open-ended VQA when queried through native language or speech. To test linguistic competence, we include control experiments meant to assess this specific aspect separate from cultural knowledge, and we observe significant performance gaps between native languages and English for both text and speech. These findings underscore the pressing need for speech-first approaches, culturally grounded pretraining, and cross-lingual cultural transfer. We release Afri-MCQA to support more inclusive multimodal AI development.

2025

Machine Translation errors in high-stakes settings like healthcare pose unique risks that could lead to clinical harm. The challenges are even more pronounced for low-resourced languages where human translators are scarce and MT tools perform poorly. In this work, we provide a taxonomy of Machine Translation errors for the healthcare domain using a publicly available MT system. Preparing an evaluation dataset from pre-existing medical datasets, we conduct our study focusing on two low-resourced languages: Amharic and Tigrinya. Based on our error analysis and findings from prior work, we test two pre-translation interventions–namely, paraphrasing the source sentence and pivoting with a related language– for their effectiveness in reducing clinical risk. We find that MT errors for healthcare most commonly happen when the source sentence includes medical terminology and procedure descriptions, synonyms, figurative language, and word order differences. We find that pre-translation interventions are not effective in reducing clinical risk if the base translation model performs poorly. Based on our findings, we provide recommendations for improving MT for healthcare.
Homophone normalization–where characters that have the same sound in a writing script are mapped to one character–is a pre-processing step applied in Amharic Natural Language Processing (NLP) literature. While this may improve performance reported by automatic metrics, it also results in models that are unable to effectively process different forms of writing in a single language. Further, there might be impacts in transfer learning, where models trained on normalized data do not generalize well to other languages. In this paper, we experiment with monolingual training and cross-lingual transfer to understand the impacts of normalization on languages that use the Ge’ez script. We then propose a post-inference intervention in which normalization is applied to model predictions instead of training data. With our simple scheme of post-inference normalization, we show that we can achieve an increase in BLEU score of up to 1.03 while preserving language features in training.