Anuj Tiwari

2026

LingoResearchGroup at SemEval-2026 Task 9: Evaluating Prompt Variants for Polarization Detection
Pritam Kadasi | Anuj Tiwari | Mayank Singh
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

Our submission presented in this paper is for SemEval-2026 Task 9: Multilingual Text Classification Challenge - Polarization Detection and it covers all three subtasks: (1) binary polarization detection, (2) polarization type classification and (3) polarization manifestation identification. We adopt a systematic approach of research on short designed prompts by considering twelve designed prompts that are different in terminology clarity, detail of the definition, guidance of reasoning and in-context examples use. The experiments are conducted using aya-101 and Gemma3-27B, with the latter chosen for the submission at the end of the development through performance considerations. Our system has an average macro level \textbf{F1-score of 0.762 on Subtask 1, 0.587 on Subtask 2 and 0.444 on Subtask 3} with the average accuracy of 0.819, 0.678 and 0.498, respectively, on the official test set averaged among 22 languages, respectively. With cross-task and cross-lingual analysis, we demonstrate that prompt-based approaches can be used effectively to detect coarse-grained polarization but encounter more and more difficulties as far as fine-grained and multi-label sociolinguistic classification is concerned.

pdf bib abs

Sample-Size Scaling of the African Languages NLI Evaluation
Anuj Tiwari | Oluwapelumi Ogunremu | Terry Oko-odion | Jesujuwon Egbewale | Hannah Sopuruchi Nwokocha
Proceedings of the 7th Workshop on African Natural Language Processing (AfricaNLP 2026)

African languages have very little labelled data, and it is unclear if augmenting the quantity of annotation data reliably enhances downstream performance. The study is a systematic sample-size scaling study of natural language inference (NLI) on 16 African languages based on the AfriXNLI benchmark. Under controlled conditions, two multilingual transformer models with roughly 0.6B parameters XLM-R Large fine-tuned on XNLI and AfroXLM-R Large are tested on sample sizes of between 50 and 500 labeled examples and average their results across random subsampling runs. As opposed to the usual belief of monotonic increase with increased data, we find a strongly language-sensitive and often non-monotonic scaling behavior. Some languages show early saturation or decrease in performance with sample size as well as high variance in low resource regimes. These results indicate that the volume of data is not enough to guarantee stable profits to African NLI, creating the necessity of language-sensitive datasets creation and stronger multi-lingual modelling strategies.

Co-authors

Mayank Singh 1

Venues

Fix author