Babangida Sani

2025

The advancement of large language models (LLMs) has allowed them to be proficient in various tasks, including content generation. However, their unregulated usage can lead to malicious activities such as plagiarism and generating and spreading fake news, especially for low-resource languages. Most existing machine-generated text detectors are trained on high-resource languages like English, French, etc. In this study, we developed the first large-scale detector that can distinguish between human- and machine-generated content in Hausa. We scraped seven Hausa-language media outlets for the human-generated text and the Gemini-2.0 flash model to automatically generate the corresponding Hausa-language articles based on the human-generated article headlines. We fine-tuned four pre-trained African-centric models (AfriTeVa, AfriBERTa, AfroX LMR, and AfroXLMR-76L) on the resulting dataset and assessed their performance using accuracy and F1-score metrics. AfroXLMR achieved the highest performance with an accuracy of 99.23% and an F1 score of 99.21%, demonstrating its effectiveness for Hausa text detection. Our dataset is made publicly available to enable further research.

pdf bib abs
Automatic Speech Recognition for African Low-Resource Languages: Challenges and Future Directions
Sukairaj Hafiz Imam | Babangida Sani | Dawit Ketema Gete | Bedru Yimam Ahmed | Ibrahim Said Ahmad | Idris Abdulmumin | Seid Muhie Yimam | Muhammad Yahuza Bello | Shamsuddeen Hassan Muhammad
Proceedings of the Sixth Workshop on African Natural Language Processing (AfricaNLP 2025)

Automatic Speech Recognition (ASR) technologies have transformed human-computer interaction; however, low-resource languages in Africa remain significantly underrepresented in both research and practical applications. This study investigates the major challenges hindering the development of ASR systems for these languages, which include data scarcity, linguistic complexity, limited computational resources, acoustic variability, and ethical concerns surrounding bias and privacy. The primary goal is to critically analyze these barriers and identify practical, inclusive strategies to advance ASR technologies within the African context. Recent advances and case studies emphasize promising strategies such as community-driven data collection, self-supervised and multilingual learning, lightweight model architectures, and techniques that prioritize privacy. Evidence from pilot projects involving various African languages showcases the feasibility and impact of customized solutions, which encompass morpheme-based modeling and domain-specific ASR applications in sectors like healthcare and education. The findings highlight the importance of interdisciplinary collaboration and sustained investment to tackle the distinct linguistic and infrastructural challenges faced by the continent. This study offers a progressive roadmap for creating ethical, efficient, and inclusive ASR systems that not only safeguard linguistic diversity but also improve digital accessibility and promote socioeconomic participation for speakers of African languages.

2024

pdf bib abs
Arewa NLP’s Participation at WMT24
Mahmoud Ahmad | Auwal Khalid | Lukman Aliyu | Babangida Sani | Mariya Abdullahi
Proceedings of the Ninth Conference on Machine Translation

This paper presents the work of our team, “ArewaNLP,” for the WMT 2024 shared task. The paper describes the system submitted to the Ninth Conference on Machine Translation (WMT24). We participated in the English-Hausa text-only translation task. We fine-tuned the OPUS-MT-en-ha transformer model and our submission achieved competitive results in this task. We achieve a BLUE score of 27.76, 40.31 and 5.85 on the Development Test, Evaluation Test and Challenge Test respectively.

Co-authors

Mahmoud Ahmad 1

Bedru Yimam Ahmed 1

Lukman Aliyu 1

Lukman Jibril Aliyu 1

Muhammad Yahuza Bello 1

Venues

Fix author