Sha Newaz Mahmud
2026
The Argonauts at SemEval-2026 Task 9: Multilingual Polarization Detection and Classification Using LLM Prompting and Transformer Fine-Tuning
Sha Newaz Mahmud | Sajib Bhattacharjee | Md. Refaj Hossan | Kawsar Ahmed | Mohammed Moshiul Hoque
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Sha Newaz Mahmud | Sajib Bhattacharjee | Md. Refaj Hossan | Kawsar Ahmed | Mohammed Moshiul Hoque
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Online polarization, defined as the pronounced division of public opinion into antagonistic groups, poses a significant threat to social cohesion. Automatic detection of polarization across diverse languages and cultures is essential for effective monitoring of online discourse. The challenge extends beyond identifying hate speech to recognizing more nuanced forms, including negative stereotypes, attribution of blame, and dehumanization. This work addresses SemEval-2026 Task 9, which focuses on detecting polarization in multiple languages. Specifically, Subtask 1 involves binary classification of message polarization, while Subtask 2 requires assigning multiple polarization labels in English and Bengali. For Subtask 1, Qwen3-14B is employed with structured few-shot prompting in 4-bit mode, yielding test macro-F1 scores of 0.847 for Bengali (4th place) and 0.808 for English (9th place). For Subtask 2, XLM-RoBERTa-large and RoBERTa-base are fine-tuned using an uneven loss (γ+ = 1, γ− =4) and label-specific thresholds, which increase development macro F1 by up to 24.6 points. The final test macro F1 for English is 0.454 (21st place). Analysis indicates that large language model prompting enhances binary polarization detection, while threshold adjustment is critical for addressing class imbalance in multi-label tasks.
The Argonauts at SemEval 2026 Task 6: Large Language Models for Response Clarity Classification: Prompting, Fine-Tuning, and Data-Centric Approaches
Sajib Bhattacharjee | Sha Newaz Mahmud | Md. Refaj Hossan | Kawsar Ahmed | Mohammed Moshiul Hoque
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Sajib Bhattacharjee | Sha Newaz Mahmud | Md. Refaj Hossan | Kawsar Ahmed | Mohammed Moshiul Hoque
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Detecting equivocation is essential, as indirect or evasive responses can shape public perception, influence political narratives, and undermine transparency in democratic discourse. To address the challenge of detecting evasive political responses on digital platforms, participation in the CLARITY SemEval-2026 Task was undertaken, which focuses on (i) clarity-level classification and (ii) fine-grained evasion-type classification in political question-answer contexts. This study introduces a data-centric framework that systematically examines the effects of class distribution and refinement strategies on the performance of Large Language Models (LLMs). A distribution-aware, LLM-augmented dataset was constructed by selectively paraphrasing minority-class instances to enhance class balance, and its performance was benchmarked against full, rebalanced, and undersampled training configurations. To comprehensively assess the proposed method, Qwen3-14B, Phi-4, Gemma-2 9B, and Mistral 7B were evaluated in in-context learning (ICL) settings (zero-shot and few-shot) and with LoRA fine-tuning. Experimental results indicate that fine-tuning Phi-4 with class rebalancing yields strong performance, achieving 74.77% on Subtask-1 and 51.55% on Subtask-2. Consequently, the system ranked 21st in Subtask-1 and 22nd in Subtask-2 on the official evaluation leaderboard.
Cuet_Neural_Navigators@DravidianLangTech 2026: Depression Detection from Malayalam and Tamil Speech using Self-Supervised Acoustic Models
Shuva Dey | Abir Dey | Sha Newaz Mahmud | Hasan Murad
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Shuva Dey | Abir Dey | Sha Newaz Mahmud | Hasan Murad
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Depression detection from speech aims to findsigns of depression using behavioral signals.This approach enables early mental healthscreening and makes it scalable. However, thetask is tough because of subtle acoustic cues,differences among speakers, and language-specific patterns. In this work, we introduceour system for the Shared Task on DepressionDetection in Dravidian Languages (DD-DL)at DravidianLangTech@ACL 2026. We fo-cus on speech in Tamil and Malayalam. Weexplore pretrained self-supervised speech en-coders, including HuBERT, XLS-R, and Whis-per, to identify acoustic patterns related to de-pression directly from raw audio. Our methodcombines these models through ensembling tocapture different acoustic features. The ex-periments use stratified evaluation and cross-lingual analysis to check how well the mod-els work across languages. Results show thatpretrained acoustic representations effectivelycapture vocal features of depression, achiev-ing Macro-F1 scores of 0.9058 for Tamil and0.9396 for Malayalam. However, cross-lingualtransfer faces challenges because of phoneticand prosodic differences.
2025
Mind_Matrix at CQs-Gen 2025: Adaptive Generation of Critical Questions for Argumentative Interventions
Sha Newaz Mahmud | Shahriar Hossain | Samia Rahman | Momtazul Arefin Labib | Hasan Murad
Proceedings of the 12th Argument mining Workshop
Sha Newaz Mahmud | Shahriar Hossain | Samia Rahman | Momtazul Arefin Labib | Hasan Murad
Proceedings of the 12th Argument mining Workshop
To encourage computational argumentation through critical question generation (CQs-Gen),we propose an ACL 2025 CQs-Gen shared task system to generate critical questions (CQs) with the best effort to counter argumentative text by discovering logical fallacies, unjustified assertions, and implicit assumptions.Our system integrates a quantized language model, semantic similarity analysis, and a meta-evaluation feedback mechanism including the key stages such as data preprocessing, rationale-augmented prompting to induce specificity, diversity filtering for redundancy elimination, enriched meta-evaluation for relevance, and a feedback-reflect-refine loop for iterative refinement. Multi-metric scoring guarantees high-quality CQs. With robust error handling, our pipeline ranked 7th among 15 teams, outperforming baseline fact-checking approaches by enabling critical engagement and successfully detecting argumentative fallacies. This study presents an adaptive, scalable method that advances argument mining and critical discourse analysis.