Lukman Aliyu

2026

Team HausaNLP at SemEval-2026 Task 9: Tackling Class Imbalance in Low-Resource Hausa Polarization Detection
Faisal Adam | Sani Aji | Lukman Aliyu | Abdulhamid Abubakar
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

This paper describes our submission toSemEval-2026 Task 9, Subtask 2 (Hausa). Thetask involves identifying specific categories ofpolarization (Political, Religious, Ethnic, etc.)in Hausa social media comments. The datasetpresented significant challenges, primarily extreme class imbalance and the low-resourcenature of the language. Our system uses a pretrained multilingual transformer (Afro-XLMRLarge) fine-tuned with Weighted Binary CrossEntropy loss and dynamic undersampling (1:3ratio) to mitigate the scarcity of polarized examples. On the official test set, our systemachieved an official Macro-F1 score of 0.2346and a Micro-F1 score of 0.2581. Our model isrecall-oriented (Micro-Recall: 0.6166), demonstrating strong capability in detecting polarization, though precision remains a challenge(0.1632). We achieved our best per-class performance in the Political domain (F1: 0.48).

pdf bib abs

Team HausaNLP at SemEval-2026 Task 4: Narratives via Semantic Embeddings
Faisal Adam | Lukman Aliyu | Sani Aji
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

This paper presents Team HausaNLP’s submission to SemEval-2026 Task 4 (Track A),which requires identifying the more narrativelysimilar of two candidate stories relative to ananchor. Narrative similarity is defined alongthree dimensions: abstract theme, course ofaction, and story outcomes. We conduct a systematic ablation comparing five approaches:a lexical TF-IDF baseline, two bi-encoderSBERT variants (all-MiniLM-L6-v2 andall-mpnet-base-v2), a paraphrase-focusedembedding model, and a cross-encoder reranker. On the 200-instance development set,all-mpnet-base-v2 achieves the best performance (61.5% accuracy, 61.48 macro-F1), outperforming both TF-IDF (54.5%) and the official SBERT baseline (55.0%). Surprisingly,the cross-encoder re-ranker (55.5%) does notimprove on the bi-encoders, which we attributeto the long-document nature of Wikipedia storysummaries exceeding the model’s effective context window. On the official test set, our primary SBERT MiniLM submission achieved61.50% accuracy (33rd of 44 teams). Our erroranalysis over 200 development instances identifies five systematic failure categories, distinctfrom the All Correct / Partial cases, including23 Lexical Trap cases, 23 Hard Cases, and 24Proposed-Recovery cases, thereby informingconcrete directions for future work.

pdf bib abs

Team faisalm3at SemEval-2026 Task 3: From Standard Regression to Distributional Alignment in Dimensional Sentiment Analysis
Faisal Adam | Lukman Aliyu | Sani Aji | Abdulhamid Abubakar | Aliyu Rabiu Shuaibu
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

This paper describes our participation in SemEval2026 Task 3: Dimensional Aspect-Based SentimentAnalysis (DimABSA) (Yu et al., 2026). We utilizeda pre-trained DeBERTa-V3 backbone to capturesemantic meaning through disentangled attention.While standard Mean Squared Error (MSE) loss establishes a performance floor, we propose a HybridMSE-CCCLoss to identify distributional relationships that simple regression missed. Our resultsdemonstrate a 54.6% reduction in validation losscompared to the baseline, significantly improvingdetection in high-intensity emotional bins by mitigating the "regression to the mean" phenomenon.

pdf bib abs

HausaNLP at SemEval-2026 Task 7: Prompt-based Hausa Cultural Question Answering
Faisal Adam | Lukman Aliyu | Sani Aji | Abdulhamid Abubakar | Aliyu Rabiu Shuaibu
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

We describe HausaNLP’s submission toSemEval-2026 Task 7 Track 1 (short-answercultural question answering). Our system is atraining-free, prompt-based pipeline targetingnative Hausa (ha-NG). Two design decisionsdistinguish it from a generic zero-shot baseline.We use locale-conditional prompting: ha-NGquestions receive a system prompt instructingconcise standard Hausa output with explicitBoko-script characters (á, â, Î, ű). Second, weuse a two-model fallback pipeline: GPT-4o handles the primary pass, and Gemini 1.5 Flash retries any rows where the primary call returnedan error or empty output, separating modelknowledge failures from API-availability failures. On the official development leaderboard,our best run reached 36.4 accuracy. Error analysis shows that a non-trivial fraction of failures are placeholder strings caused by APIerrors rather than incorrect generations, andthat surface-level mismatches (verbosity, orthographic variation) account for many of the remaining errors. Code, prompts, and processingscripts are released for reproducibility.

2024

pdf bib abs

Arewa NLP’s Participation at WMT24
Mahmoud Ahmad | Auwal Khalid | Lukman Aliyu | Babangida Sani | Mariya Abdullahi
Proceedings of the Ninth Conference on Machine Translation

This paper presents the work of our team, “ArewaNLP,” for the WMT 2024 shared task. The paper describes the system submitted to the Ninth Conference on Machine Translation (WMT24). We participated in the English-Hausa text-only translation task. We fine-tuned the OPUS-MT-en-ha transformer model and our submission achieved competitive results in this task. We achieve a BLUE score of 27.76, 40.31 and 5.85 on the Development Test, Evaluation Test and Challenge Test respectively.

pdf bib abs

Semantic Text Relatedness (STR), a measure of meaning similarity between text elements, has become a key focus in the field of Natural Language Processing (NLP). We describe SemEval-2024 task 1 on Semantic Textual Relatedness featuring three tracks: supervised learning, unsupervised learning and cross-lingual learning across African and Asian languages including Afrikaans, Algerian Arabic, Amharic, Hausa, Hindi, Indonesian, Kinyarwanda, Marathi, Moroccan Arabic, Modern Standard Arabic, Punjabi, Spanish, and Telugu. Our goal is to analyse the semantic representation of sentences textual relatedness trained on mBert, all-MiniLM-L6-v2 and Bert-Based-uncased. The effectiveness of these models is evaluated using the Spearman Correlation metric, which assesses the strength of the relationship between paired data. The finding reveals the viability of transformer models in multilingual STR tasks.

Co-authors

Falalu Ibrahim Lawan 1

Alamin Musa 1

Nur Rabiu 1

Saheed Abdullahi Salahudeen 1

Babangida Sani 1

Venues

Fix author