2025
pdf
bib
abs
HausaNLP: Current Status, Challenges and Future Directions for Hausa Natural Language Processing
Shamsuddeen Hassan Muhammad
|
Ibrahim Said Ahmad
|
Idris Abdulmumin
|
Falalu Ibrahim Lawan
|
Sukairaj Hafiz Imam
|
Yusuf Aliyu
|
Sani Abdullahi Sani
|
Ali Usman Umar
|
Tajuddeen Gwadabe
|
Kenneth Church
|
Vukosi Marivate
Proceedings of the Sixth Workshop on African Natural Language Processing (AfricaNLP 2025)
Hausa Natural Language Processing (NLP) has gained increasing attention in recent years, yet remains understudied as a low-resource language despite having over 120 million first-language (L1) and 80 million second-language (L2) speakers worldwide. While significant advances have been made in high-resource languages, Hausa NLP faces persistent challenges including limited open-source datasets and inadequate model representation. This paper presents an overview of the current state of Hausa NLP, systematically examining existing resources, research contributions, and gaps across fundamental NLP tasks: text classification, machine translation, named entity recognition, speech recognition, and question answering. We introduce HausaNLP, a curated catalog that aggregates datasets, tools, and research works to enhance accessibility and drive further development. Furthermore, we discuss challenges in integrating Hausa into large language models (LLMs), addressing issues of suboptimal tokenization, and dialectal variation. Finally, we propose strategic research directions emphasizing dataset expansion, improved language modeling approaches, and strengthened community collaboration to advance Hausa NLP. Our work provides both a foundation for accelerating Hausa NLP progress and valuable insights for broader multilingual NLP research.
2024
pdf
bib
abs
HausaNLP at SemEval-2024 Task 1: Textual Relatedness Analysis for Semantic Representation of Sentences
Saheed Abdullahi Salahudeen
|
Falalu Ibrahim Lawan
|
Yusuf Aliyu
|
Amina Abubakar
|
Lukman Aliyu
|
Nur Rabiu
|
Mahmoud Ahmad
|
Aliyu Rabiu Shuaibu
|
Alamin Musa
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
Semantic Text Relatedness (STR), a measure of meaning similarity between text elements, has become a key focus in the field of Natural Language Processing (NLP). We describe SemEval-2024 task 1 on Semantic Textual Relatedness featuring three tracks: supervised learning, unsupervised learning and cross-lingual learning across African and Asian languages including Afrikaans, Algerian Arabic, Amharic, Hausa, Hindi, Indonesian, Kinyarwanda, Marathi, Moroccan Arabic, Modern Standard Arabic, Punjabi, Spanish, and Telugu. Our goal is to analyse the semantic representation of sentences textual relatedness trained on mBert, all-MiniLM-L6-v2 and Bert-Based-uncased. The effectiveness of these models is evaluated using the Spearman Correlation metric, which assesses the strength of the relationship between paired data. The finding reveals the viability of transformer models in multilingual STR tasks.