Faisal Muhammad Adam
Also published as: Faisal Adam
2026
Team HausaNLP at SemEval-2026 Task 9: Tackling Class Imbalance in Low-Resource Hausa Polarization Detection
Faisal Adam | Sani Aji | Lukman Aliyu | Abdulhamid Abubakar
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Faisal Adam | Sani Aji | Lukman Aliyu | Abdulhamid Abubakar
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
This paper describes our submission toSemEval-2026 Task 9, Subtask 2 (Hausa). Thetask involves identifying specific categories ofpolarization (Political, Religious, Ethnic, etc.)in Hausa social media comments. The datasetpresented significant challenges, primarily extreme class imbalance and the low-resourcenature of the language. Our system uses a pretrained multilingual transformer (Afro-XLMRLarge) fine-tuned with Weighted Binary CrossEntropy loss and dynamic undersampling (1:3ratio) to mitigate the scarcity of polarized examples. On the official test set, our systemachieved an official Macro-F1 score of 0.2346and a Micro-F1 score of 0.2581. Our model isrecall-oriented (Micro-Recall: 0.6166), demonstrating strong capability in detecting polarization, though precision remains a challenge(0.1632). We achieved our best per-class performance in the Political domain (F1: 0.48).
Team HausaNLP at SemEval-2026 Task 4: Narratives via Semantic Embeddings
Faisal Adam | Lukman Aliyu | Sani Aji
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Faisal Adam | Lukman Aliyu | Sani Aji
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
This paper presents Team HausaNLP’s submission to SemEval-2026 Task 4 (Track A),which requires identifying the more narrativelysimilar of two candidate stories relative to ananchor. Narrative similarity is defined alongthree dimensions: abstract theme, course ofaction, and story outcomes. We conduct a systematic ablation comparing five approaches:a lexical TF-IDF baseline, two bi-encoderSBERT variants (all-MiniLM-L6-v2 andall-mpnet-base-v2), a paraphrase-focusedembedding model, and a cross-encoder reranker. On the 200-instance development set,all-mpnet-base-v2 achieves the best performance (61.5% accuracy, 61.48 macro-F1), outperforming both TF-IDF (54.5%) and the official SBERT baseline (55.0%). Surprisingly,the cross-encoder re-ranker (55.5%) does notimprove on the bi-encoders, which we attributeto the long-document nature of Wikipedia storysummaries exceeding the model’s effective context window. On the official test set, our primary SBERT MiniLM submission achieved61.50% accuracy (33rd of 44 teams). Our erroranalysis over 200 development instances identifies five systematic failure categories, distinctfrom the All Correct / Partial cases, including23 Lexical Trap cases, 23 Hard Cases, and 24Proposed-Recovery cases, thereby informingconcrete directions for future work.
Team faisalm3at SemEval-2026 Task 3: From Standard Regression to Distributional Alignment in Dimensional Sentiment Analysis
Faisal Adam | Lukman Aliyu | Sani Aji | Abdulhamid Abubakar | Aliyu Rabiu Shuaibu
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Faisal Adam | Lukman Aliyu | Sani Aji | Abdulhamid Abubakar | Aliyu Rabiu Shuaibu
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
This paper describes our participation in SemEval2026 Task 3: Dimensional Aspect-Based SentimentAnalysis (DimABSA) (Yu et al., 2026). We utilizeda pre-trained DeBERTa-V3 backbone to capturesemantic meaning through disentangled attention.While standard Mean Squared Error (MSE) loss establishes a performance floor, we propose a HybridMSE-CCCLoss to identify distributional relationships that simple regression missed. Our resultsdemonstrate a 54.6% reduction in validation losscompared to the baseline, significantly improvingdetection in high-intensity emotional bins by mitigating the "regression to the mean" phenomenon.
HausaNLP at SemEval-2026 Task 7: Prompt-based Hausa Cultural Question Answering
Faisal Adam | Lukman Aliyu | Sani Aji | Abdulhamid Abubakar | Aliyu Rabiu Shuaibu
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Faisal Adam | Lukman Aliyu | Sani Aji | Abdulhamid Abubakar | Aliyu Rabiu Shuaibu
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
We describe HausaNLP’s submission toSemEval-2026 Task 7 Track 1 (short-answercultural question answering). Our system is atraining-free, prompt-based pipeline targetingnative Hausa (ha-NG). Two design decisionsdistinguish it from a generic zero-shot baseline.We use locale-conditional prompting: ha-NGquestions receive a system prompt instructingconcise standard Hausa output with explicitBoko-script characters (á, â, Î, ű). Second, weuse a two-model fallback pipeline: GPT-4o handles the primary pass, and Gemini 1.5 Flash retries any rows where the primary call returnedan error or empty output, separating modelknowledge failures from API-availability failures. On the official development leaderboard,our best run reached 36.4 accuracy. Error analysis shows that a non-trivial fraction of failures are placeholder strings caused by APIerrors rather than incorrect generations, andthat surface-level mismatches (verbosity, orthographic variation) account for many of the remaining errors. Code, prompts, and processingscripts are released for reproducibility.
CommonLID: Re-evaluating State-of-the-Art Language Identification Performance on Web Data
Pedro Ortiz Suarez | Laurie Burchell | Catherine Arnett | Rafael Mosquera | Sara Hincapié Monsalve | Thom Vaughan | Damian Stewart | Malte Ostendorff | Idris Abdulmumin | Vukosi Marivate | Shamsuddeen Hassan Muhammad | Atnafu Lambebo Tonja | Hend Al-Khalifa | Nadia Ghezaiel Hammouda | Verrah Akinyi Otiende | Tack Hwa Wong | Jakhongir Saydaliev | Melika Nobakhtian | Muhammad Ravi Shulthan Habibi | Chalamalasetti Kranti | Carol Muchemi | Khang Nguyen | Faisal Muhammad Adam | Luis Frentzen Salim | Reem Alqifari | Cynthia Jayne Amol | Joseph Marvin Imperial | Ilker Kesen | Ahmad Mustafid | Pavel Stepachev | Leshem Choshen | David Anugraha | Hamada Nayel | Seid Muhie Yimam | Vallerie Alexandra Putra | My Chiffon Nguyen | Azmine Toushik Wasi | Gouthami Vadithya | Rob Van Der Goot | Lanwenn ar C’horr | Karan Dua | Andrew Yates | Mithil Bangera | Yeshil Bangera | Hitesh Laxmichand Patel | Shu Okabe | Fenal Ashokbhai Ilasariya | Dmitry Gaynullin | Genta Indra Winata | Yiyuan Li | Juan Pablo Martínez | Amit Agarwal | Ikhlasul Akmal Hanif | Raia Abu Ahmad | Esther Adenuga | Filbert Aurelian Tjiaranata | Weerayut Buaphet | Michael Anugraha | Sowmya Vajjala | Benjamin L Rice | Azril Hafizi Amirudin | Jesujoba Oluwadara Alabi | Srikant Panda | Yassine Toughrai | Bruhan Kyomuhendo | Daniel Ruffinelli | Akshata | Manuel Goulão | Ej Zhou | Ingrid Gabriela Franco Ramirez | Cristina Aggazzotti | Konstantin Dobler | Jun Kevin | Quentin Pagès | Nicholas Andrews | Nuhu Ibrahim | Mattes Ruckdeschel | Amr Keleg | Mike Zhang | Casper Rufaro Muziri | Saron Samuel | Sotaro Takeshita | Kun Kerdthaisong | Luca Foppiano | Rasul Dent | Tommaso Green | Ahmad Mustapha Wali | Kamohelo Makaaka | Vicky Feliren | Inshirah Idris | Hande Celikkanat | Abdulhamid Abubakar | Jean Maillard | Benoît Sagot | Thibault Clérice | Kenton Murray | Sarah K. K. Luger
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Pedro Ortiz Suarez | Laurie Burchell | Catherine Arnett | Rafael Mosquera | Sara Hincapié Monsalve | Thom Vaughan | Damian Stewart | Malte Ostendorff | Idris Abdulmumin | Vukosi Marivate | Shamsuddeen Hassan Muhammad | Atnafu Lambebo Tonja | Hend Al-Khalifa | Nadia Ghezaiel Hammouda | Verrah Akinyi Otiende | Tack Hwa Wong | Jakhongir Saydaliev | Melika Nobakhtian | Muhammad Ravi Shulthan Habibi | Chalamalasetti Kranti | Carol Muchemi | Khang Nguyen | Faisal Muhammad Adam | Luis Frentzen Salim | Reem Alqifari | Cynthia Jayne Amol | Joseph Marvin Imperial | Ilker Kesen | Ahmad Mustafid | Pavel Stepachev | Leshem Choshen | David Anugraha | Hamada Nayel | Seid Muhie Yimam | Vallerie Alexandra Putra | My Chiffon Nguyen | Azmine Toushik Wasi | Gouthami Vadithya | Rob Van Der Goot | Lanwenn ar C’horr | Karan Dua | Andrew Yates | Mithil Bangera | Yeshil Bangera | Hitesh Laxmichand Patel | Shu Okabe | Fenal Ashokbhai Ilasariya | Dmitry Gaynullin | Genta Indra Winata | Yiyuan Li | Juan Pablo Martínez | Amit Agarwal | Ikhlasul Akmal Hanif | Raia Abu Ahmad | Esther Adenuga | Filbert Aurelian Tjiaranata | Weerayut Buaphet | Michael Anugraha | Sowmya Vajjala | Benjamin L Rice | Azril Hafizi Amirudin | Jesujoba Oluwadara Alabi | Srikant Panda | Yassine Toughrai | Bruhan Kyomuhendo | Daniel Ruffinelli | Akshata | Manuel Goulão | Ej Zhou | Ingrid Gabriela Franco Ramirez | Cristina Aggazzotti | Konstantin Dobler | Jun Kevin | Quentin Pagès | Nicholas Andrews | Nuhu Ibrahim | Mattes Ruckdeschel | Amr Keleg | Mike Zhang | Casper Rufaro Muziri | Saron Samuel | Sotaro Takeshita | Kun Kerdthaisong | Luca Foppiano | Rasul Dent | Tommaso Green | Ahmad Mustapha Wali | Kamohelo Makaaka | Vicky Feliren | Inshirah Idris | Hande Celikkanat | Abdulhamid Abubakar | Jean Maillard | Benoît Sagot | Thibault Clérice | Kenton Murray | Sarah K. K. Luger
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Language identification (LID) is a fundamental step in curating multilingual corpora. However, LID models still perform poorly for many languages, especially on the noisy and heterogeneous web data often used to train multilingual language models. In this paper, we introduce CommonLID, a community-driven, human-annotated LID benchmark for the web domain, covering 109 languages. Many of the included languages have been previously under-served, making CommonLID a key resource for developing more representative high-quality text corpora. We show CommonLID’s value by using it, alongside five other common evaluation sets, to test eight popular LID models. We analyse our results to situate our contribution and to provide an overview of the state of the art. In particular, we highlight that existing evaluations overestimate LID accuracy for many languages in the web domain. We make CommonLID and the code used to create it available under an open, permissive license.
Search
Fix author
Co-authors
- Abdulhamid Abubakar 4
- Sani Aji 4
- Lukman Aliyu 4
- Aliyu Rabiu Shuaibu 2
- Idris Abdulmumin 1
- Esther Adenuga 1
- Amit Agarwal 1
- Cristina Aggazzotti 1
- Raia Abu Ahmad 1
- Akshata 1
- Hend Al-Khalifa 1
- Jesujoba Alabi 1
- Vallerie Alexandra Putra 1
- Reem Alqifari 1
- Azril Hafizi Amirudin 1
- Cynthia Jayne Amol 1
- Nicholas Andrews 1
- David Anugraha 1
- Michael Anugraha 1
- Catherine Arnett 1
- Mithil Bangera 1
- Yeshil Bangera 1
- Weerayut Buaphet 1
- Laurie Burchell 1
- Hande Celikkanat 1
- Leshem Choshen 1
- Thibault Clérice 1
- Lanwenn ar C’horr 1
- Rasul Dent 1
- Konstantin Dobler 1
- Karan Dua 1
- Vicky Feliren 1
- Luca Foppiano 1
- Dmitry Gaynullin 1
- Manuel Goulão 1
- Tommaso Green 1
- Muhammad Ravi Shulthan Habibi 1
- Nadia Ghezaiel Hammouda 1
- Ikhlasul Akmal Hanif 1
- Nuhu Ibrahim 1
- Inshirah Idris 1
- Fenal Ashokbhai Ilasariya 1
- Joseph Marvin Imperial 1
- Amr Keleg 1
- Kun Kerdthaisong 1
- Ilker Kesen 1
- Jun Kevin 1
- Chalamalasetti Kranti 1
- Bruhan Kyomuhendo 1
- Yiyuan Li 1
- Sarah K. K. Luger 1
- Jean Maillard 1
- Kamohelo Makaaka 1
- Vukosi Marivate 1
- Juan Pablo Martínez 1
- Sara Hincapié Monsalve 1
- Rafael Mosquera 1
- Carol Muchemi 1
- Shamsuddeen Hassan Muhammad 1
- Kenton Murray 1
- Ahmad Mustafid 1
- Casper Rufaro Muziri 1
- Hamada Nayel 1
- Khang Nguyen 1
- My Chiffon Nguyen 1
- Melika Nobakhtian 1
- Shu Okabe 1
- Pedro Ortiz Suarez 1
- Malte Ostendorff 1
- Verrah Akinyi Otiende 1
- Quentin Pagès 1
- Srikant Panda 1
- Hitesh Laxmichand Patel 1
- Ingrid Gabriela Franco Ramirez 1
- Benjamin L Rice 1
- Mattes Ruckdeschel 1
- Daniel Ruffinelli 1
- Benoît Sagot 1
- Luis Frentzen Salim 1
- Saron Samuel 1
- Jakhongir Saydaliev 1
- Pavel Stepachev 1
- Damian Stewart 1
- Sotaro Takeshita 1
- Filbert Aurelian Tjiaranata 1
- Atnafu Lambebo Tonja 1
- Yassine Toughrai 1
- Gouthami Vadithya 1
- Sowmya Vajjala 1
- Rob Van Der Goot 1
- Thom Vaughan 1
- Ahmad Mustapha Wali 1
- Azmine Toushik Wasi 1
- Genta Indra Winata 1
- Tack Hwa Wong 1
- Andrew Yates 1
- Seid Muhie Yimam 1
- Mike Zhang 1
- Ej Zhou 1