Mardiyyah Oduwole


2026

Current guardian models are predominantly Western-centric and optimized for high-resource languages, leaving low-resource African languages vulnerable to evolving harms, cross-lingual failures, and cultural misalignment. Moreover, most guardian models rely on rigid, predefined safety categories that fail to generalize across diverse linguistic and sociocultural contexts. Achieving robust safety requires flexible, runtime-enforceable policies and benchmarks that reflect local norms, harm scenarios, and cultural expectations. We introduce UbuntuGuard, the first policy-based safety benchmark for African languages built from adversarial queries authored by 155 domain experts across sensitive fields, including healthcare. From these expert-crafted queries, we derive context-specific safety policies and reference responses that capture culturally grounded risk signals, enabling policy-aligned evaluation of guardian models. We evaluate 15 models, comprising seven general-purpose LLMs and eight guardian models across three distinct variants: static, dynamic, and multilingual. Our findings reveal that existing English-centric benchmarks overestimate real-world multilingual safety, cross-lingual transfer provides partial but insufficient coverage, and dynamic models, while better equipped to leverage policies at inference time, still struggle to fully localize African-language contexts. These findings highlight the urgent need for multilingual, culturally grounded safety benchmarks to enable the development of reliable and equitable guardian models for low-resource languages.
Multimodal AI research has overwhelmingly focused on high-resource languages, hindering the democratization of advancements in the field. To address this, we present AfriCaption, a comprehensive framework for multilingual image captioning in 20 African languages and our contributions are threefold: (i) a curated dataset built on Flickr8k, featuring semantically aligned captions generated via a context-aware selection and translation process; (ii) a dynamic, context-preserving pipeline that ensures ongoing quality through model ensembling and adaptive substitution; and (iii) the AfriCaption model, a 0.5B parametervision-to-text architecture that integrates SigLIP and NLLB200 for caption generation across underrepresented languages. This unified framework ensures ongoing data quality and establishes the first scalable image-captioning resource for underrepresented African languages, laying the groundwork for truly inclusive multimodal AI.

2023

Detecting harmful content on social media plat-forms is crucial in preventing the negative ef-fects these posts can have on social media users. This paper presents our methodology for tack-ling task 10 from SemEval23, which focuseson detecting and classifying online sexism insocial media posts. We constructed our solu-tion using an ensemble of transformer-basedmodels (that have been fine-tuned; BERTweet,RoBERTa, and DeBERTa). To alleviate the var-ious issues caused by the class imbalance inthe dataset provided and improve the general-ization of our model, our framework employsdata augmentation and semi-supervised learn-ing. Specifically, we use back-translation fordata augmentation in two scenarios: augment-ing the underrepresented class and augment-ing all classes. In this study, we analyze theimpact of these different strategies on the sys-tem’s overall performance and determine whichtechnique is the most effective. Extensive ex-periments demonstrate the efficacy of our ap-proach. For sub-task A, the system achievedan F1-score of 0.8613. The source code to re-produce the proposed solutions is available onGithub
MasakhaNEWS: News Topic Classification for African languages
David Ifeoluwa Adelani | Marek Masiak | Israel Abebe Azime | Jesujoba Alabi | Atnafu Lambebo Tonja | Christine Mwase | Odunayo Ogundepo | Bonaventure F. P. Dossou | Akintunde Oladipo | Doreen Nixdorf | Chris Chinenye Emezue | Sana Al-azzawi | Blessing Sibanda | Davis David | Lolwethu Ndolela | Jonathan Mukiibi | Tunde Ajayi | Tatiana Moteu | Brian Odhiambo | Abraham Owodunni | Nnaemeka Obiefuna | Muhidin Mohamed | Shamsuddeen Hassan Muhammad | Teshome Mulugeta Ababu | Saheed Abdullahi Salahudeen | Mesay Gemeda Yigezu | Tajuddeen Gwadabe | Idris Abdulmumin | Mahlet Taye | Oluwabusayo Awoyomi | Iyanuoluwa Shode | Tolulope Adelani | Habiba Abdulganiyu | Abdul-Hakeem Omotayo | Adetola Adeeko | Abeeb Afolabi | Anuoluwapo Aremu | Olanrewaju Samuel | Clemencia Siro | Wangari Kimotho | Onyekachi Ogbu | Chinedu Mbonu | Chiamaka Chukwuneke | Samuel Fanijo | Jessica Ojo | Oyinkansola Awosan | Tadesse Kebede | Toadoum Sari Sakayo | Pamela Nyatsine | Freedmore Sidume | Oreen Yousuf | Mardiyyah Oduwole | Kanda Tshinu | Ussen Kimanuka | Thina Diko | Siyanda Nxakama | Sinodos Nigusse | Abdulmejid Johar | Shafie Mohamed | Fuad Mire Hassan | Moges Ahmed Mehamed | Evrard Ngabire | Jules Jules | Ivan Ssenkungu | Pontus Stenetorp
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Search Fix author