Muhidin A. Mohamed
2026
Morphologically-informed Somali Lemmatization Corpus built with a Web-based Crowdsourcing Platform
Abdifatah Ahmed Gedi | Shafie Abdi Mohamed | Yusuf A. Yusuf | Muhidin A. Mohamed | Fuad Mire Hassan | Houssein A Assowe
Proceedings of the 7th Workshop on African Natural Language Processing (AfricaNLP 2026)
Abdifatah Ahmed Gedi | Shafie Abdi Mohamed | Yusuf A. Yusuf | Muhidin A. Mohamed | Fuad Mire Hassan | Houssein A Assowe
Proceedings of the 7th Workshop on African Natural Language Processing (AfricaNLP 2026)
Lemmatization, which reduces words to their root forms, plays a key role in tasks such as information retrieval, text indexing, and machinelearning-based language models. However, a key research challenge for low-resourced languages such as the Somali is the lack of human-annotated lemmatization datasets and reliable ground truth to underpin accurate morphological analysis and training relevant NLP models. To address this problem, we developed the first large-scale, purpose-built Somali lemmatization lexicon, coupled with a crowdsourcing platform for ongoing expansion. The system leverages Somali’s agglutinative and derivational morphology, encompassing over5,584 root words and 78,629 derivative forms, each annotated with part-of-speech tags. For data validation purpose, we have devised a pilot lexicon-based lemmatizer integrated with rule-based logic to handle out-of-vocabulary terms. Evaluation on a 294-document corpuscovering news articles, social media posts, and short messages shows lemmatization accuracies of 51.27% for full articles, 44.14% forexcerpts, and 59.51% for short texts such as tweets. These results demonstrate that combining lexical resources, POS tagging, and rulebased strategies provides a robust and scalable framework for addressing morphological complexity in Somali and other low-resource languages
Afri-MCQA: Multimodal Cultural Question Answering for African Languages
Atnafu Lambebo Tonja | Srija Anand | Emilio Villa-Cueva | Israel Abebe Azime | Jesujoba Oluwadara Alabi | Muhidin A. Mohamed | Debela Desalegn Yadeta | Negasi Haile Abadi | Abigail Oppong | Nnaemeka Casmir Obiefuna | Idris Abdulmumin | Naome A Etori | Eric Peter Wairagala | Kanda Patrick Tshinu | Imanigirimbabazi Emmanuel | Gabofetswe Malema | Alham Fikri Aji | David Ifeoluwa Adelani | Thamar Solorio
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Atnafu Lambebo Tonja | Srija Anand | Emilio Villa-Cueva | Israel Abebe Azime | Jesujoba Oluwadara Alabi | Muhidin A. Mohamed | Debela Desalegn Yadeta | Negasi Haile Abadi | Abigail Oppong | Nnaemeka Casmir Obiefuna | Idris Abdulmumin | Naome A Etori | Eric Peter Wairagala | Kanda Patrick Tshinu | Imanigirimbabazi Emmanuel | Gabofetswe Malema | Alham Fikri Aji | David Ifeoluwa Adelani | Thamar Solorio
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Africa is home to over one-third of the world’s languages, yet remains severely underrepresented in multimodal AI research. We introduce Afri-MCQA, the first Multilingual Cultural Question-Answering benchmark containing 7.5k Q A pairs across 15 African languages from 12 countries. The benchmark offers parallel text and speech modalities and was entirely created by native speakers. We find that models show poor performance across evaluated cultures, with near-zero accuracy on open-ended VQA when queried through native language or speech. To test linguistic competence, we include control experiments meant to assess this specific aspect separate from cultural knowledge, and we observe significant performance gaps between native languages and English for both text and speech. These findings underscore the pressing need for speech-first approaches, culturally grounded pretraining, and cross-lingual cultural transfer. We release Afri-MCQA to support more inclusive multimodal AI development.
Search
Fix author
Co-authors
- Negasi Haile Abadi 1
- Idris Abdulmumin 1
- David Ifeoluwa Adelani 1
- Alham Fikri Aji 1
- Jesujoba Alabi 1
- Srija Anand 1
- Houssein A Assowe 1
- Israel Abebe Azime 1
- Imanigirimbabazi Emmanuel 1
- Naome A. Etori 1
- Abdifatah Ahmed Gedi 1
- Fuad Mire Hassan 1
- Gabofetswe Malema 1
- Shafie Abdi Mohamed 1
- Nnaemeka Casmir Obiefuna 1
- Abigail Oppong 1
- Thamar Solorio 1
- Atnafu Lambebo Tonja 1
- Kanda Patrick Tshinu 1
- Emilio Villa-Cueva 1
- Eric Peter Wairagala 1
- Debela Desalegn Yadeta 1
- Yusuf A. Yusuf 1