Sani Aji

2026

Team HausaNLP at SemEval-2026 Task 9: Tackling Class Imbalance in Low-Resource Hausa Polarization Detection
Faisal Adam | Sani Aji | Lukman Aliyu | Abdulhamid Abubakar
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

This paper describes our submission toSemEval-2026 Task 9, Subtask 2 (Hausa). Thetask involves identifying specific categories ofpolarization (Political, Religious, Ethnic, etc.)in Hausa social media comments. The datasetpresented significant challenges, primarily extreme class imbalance and the low-resourcenature of the language. Our system uses a pretrained multilingual transformer (Afro-XLMRLarge) fine-tuned with Weighted Binary CrossEntropy loss and dynamic undersampling (1:3ratio) to mitigate the scarcity of polarized examples. On the official test set, our systemachieved an official Macro-F1 score of 0.2346and a Micro-F1 score of 0.2581. Our model isrecall-oriented (Micro-Recall: 0.6166), demonstrating strong capability in detecting polarization, though precision remains a challenge(0.1632). We achieved our best per-class performance in the Political domain (F1: 0.48).

pdf bib abs

Team HausaNLP at SemEval-2026 Task 4: Narratives via Semantic Embeddings
Faisal Adam | Lukman Aliyu | Sani Aji
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

This paper presents Team HausaNLP’s submission to SemEval-2026 Task 4 (Track A),which requires identifying the more narrativelysimilar of two candidate stories relative to ananchor. Narrative similarity is defined alongthree dimensions: abstract theme, course ofaction, and story outcomes. We conduct a systematic ablation comparing five approaches:a lexical TF-IDF baseline, two bi-encoderSBERT variants (all-MiniLM-L6-v2 andall-mpnet-base-v2), a paraphrase-focusedembedding model, and a cross-encoder reranker. On the 200-instance development set,all-mpnet-base-v2 achieves the best performance (61.5% accuracy, 61.48 macro-F1), outperforming both TF-IDF (54.5%) and the official SBERT baseline (55.0%). Surprisingly,the cross-encoder re-ranker (55.5%) does notimprove on the bi-encoders, which we attributeto the long-document nature of Wikipedia storysummaries exceeding the model’s effective context window. On the official test set, our primary SBERT MiniLM submission achieved61.50% accuracy (33rd of 44 teams). Our erroranalysis over 200 development instances identifies five systematic failure categories, distinctfrom the All Correct / Partial cases, including23 Lexical Trap cases, 23 Hard Cases, and 24Proposed-Recovery cases, thereby informingconcrete directions for future work.

pdf bib abs

Team faisalm3at SemEval-2026 Task 3: From Standard Regression to Distributional Alignment in Dimensional Sentiment Analysis
Faisal Adam | Lukman Aliyu | Sani Aji | Abdulhamid Abubakar | Aliyu Rabiu Shuaibu
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

This paper describes our participation in SemEval2026 Task 3: Dimensional Aspect-Based SentimentAnalysis (DimABSA) (Yu et al., 2026). We utilizeda pre-trained DeBERTa-V3 backbone to capturesemantic meaning through disentangled attention.While standard Mean Squared Error (MSE) loss establishes a performance floor, we propose a HybridMSE-CCCLoss to identify distributional relationships that simple regression missed. Our resultsdemonstrate a 54.6% reduction in validation losscompared to the baseline, significantly improvingdetection in high-intensity emotional bins by mitigating the "regression to the mean" phenomenon.

pdf bib abs

HausaNLP at SemEval-2026 Task 7: Prompt-based Hausa Cultural Question Answering
Faisal Adam | Lukman Aliyu | Sani Aji | Abdulhamid Abubakar | Aliyu Rabiu Shuaibu
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

We describe HausaNLP’s submission toSemEval-2026 Task 7 Track 1 (short-answercultural question answering). Our system is atraining-free, prompt-based pipeline targetingnative Hausa (ha-NG). Two design decisionsdistinguish it from a generic zero-shot baseline.We use locale-conditional prompting: ha-NGquestions receive a system prompt instructingconcise standard Hausa output with explicitBoko-script characters (á, â, Î, ű). Second, weuse a two-model fallback pipeline: GPT-4o handles the primary pass, and Gemini 1.5 Flash retries any rows where the primary call returnedan error or empty output, separating modelknowledge failures from API-availability failures. On the official development leaderboard,our best run reached 36.4 accuracy. Error analysis shows that a non-trivial fraction of failures are placeholder strings caused by APIerrors rather than incorrect generations, andthat surface-level mismatches (verbosity, orthographic variation) account for many of the remaining errors. Code, prompts, and processingscripts are released for reproducibility.

Co-authors

Venues

SemEval4
WS4

Fix author