Sandesh Kumar

2026

Habib University at SemEval-2026 Task 3: A Pipeline Approach for Dimensional Aspect-Based Sentiment Analysis
Muhammad Affan | M Hassan Shahzad | Mikaal Imam | Moiz Zulfiqar | Sandesh Kumar | Abdul Samad
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

Aspect-based sentiment analysis has evolved from categorical polarity classification to fine-grained modeling of continuous affective dimensions. Dimensional Aspect-Based Sentiment Analysis (DimABSA) extends this paradigm by requiring both structured sentiment extraction and continuous valence–arousal (VA) regression in multilingual settings. In this paper, we present our system for SemEval-2026 Task 3, which evaluates this challenge across six languages and four domains, requiring systems to extract aspect–category–opinion quadruplets and predict VA scores on a 1–9 scale.We propose a modular four-stage multilingual transformer pipeline for element extraction, aspect–opinion pairing, category prediction, and VA regression. We conduct experiments over multiple models and training configurations, including VA rescaling to [-1,1], Gaussian label noise injection, Concordance Correlation Coefficient (CCC) loss, and Savitzky–Golay smoothing. Among all languages, our system achieves the lowest RMSE of 0.5333 on Subtask 1 and the highest cF1 of 0.5492 on Subtask 2. We further investigate data augmentation to improve low-resource performance and address label imbalance. Ultimately, our modular architecture demonstrated highly competitive cross-lingual transfer, achieving top-tier placements in low-resource settings, including 2nd place for Tatar and 6th place for Russian in dimensional regression.

pdf bib abs

We describe our submission to SemEval-2026 Task 6: CLARITY, which aims to classify political question–answer pairs by response clarity and evasive technique. We investigate several approaches, including long-context transformers, multiple instance learning, hierarchical multi-task models, and a natural language inference (NLI) formulation. On the development set, our best-performing NLI model achieves a macro-F1 of 0.79 for Subtask 1, while our best attention-based MIL model achieves a macro-F1 of 0.43 for Subtask 2. On the hidden evaluation set, our official submission obtains macro-F1 scores of 0.81 for Subtask 1 and 0.45 for Subtask 2. Our findings demonstrate the benefits of entailment-based modeling for clarity prediction and localized reasoning for evasion detection under limited computational resources.

pdf bib abs

ConTexT at SemEval-2026 Task 5: Rating Plausibility of Word Senses in Ambiguous Stories through Narrative Understanding
Fakeha Faisal | Rubab Shah | Syeda Zaidi | Azkaa Nasir | Sandesh Kumar | Abdul Samad
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

In this paper, we report our system for SemEval-2026 Task 5, which predicts graded plausibility scores for target word senses in narrative context. We explore embedding-based similarity, transformer fine tuning, and a three-stage curriculum combining WiC pretraining, Wasserstein distribution learning, and KL-based calibration. Our best model, DeBERTa-XLarge with curriculum training, achieves 78% accu-racy within one standard deviation and a Spear-man correlation of 0.70, with an overall test score of 0.74. Results show that distribution modeling better aligns with human plausibility judgments than single-score prediction

pdf bib abs

HU at SemEval-2026 Task 10: Psycholinguistic Conspiracy Marker Extraction and Detection
Muhammad Quddussi Kashaf | Shahmir Mustafa Chaudhry | Marium Zeeshan | Nahyan Javed | Sandesh Kumar | Abdul Samad
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

Modern media poses a complex challenge to verifying the credibility of information and public discourse due to the advent of conspiracy theory content. This paper presents our methodology in "SemEval-2026 Task 10: Psycholinguistic Conspiracy Marker Extraction and Detection". It consists of two subtasks: extracting psycholinguistic markers from text using Named Entity Recognition (NER) techniques, and classifying Reddit comments as conspiratorial or non-conspiratorial. Our approach involved: (1) diverse extraction methodologies, including traditional bio tagging schemes, the GlobalPointer framework, and the GLiNER2 architecture, (2) data augmentation and synthetic data generation via Large Language Models (LLMs), and (3) evaluating various transformer-based models, such as DistilBERT and Covid Twitter-BERT. Our final system achieves a macro F1 score of 0.26 on Subtask 1 and 0.76 on Subtask 2.

pdf bib abs

MSqrd at SemEval-2026 Task 9: Detecting Multilingual, Multicultural and Multievent Online Polarization
Syeda Samah Daniyal | Muneeba Badar | Manal Hasan | Shifa Shah | Sandesh Kumar | Abdul Samad
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

Online polarization, the critical division between social, political, or identity groups, often leads to hate speech and social fragmentation. Detecting polarization, especially across diverse linguistic and cultural contexts, is a critical challenge. This paper presents our submission for SemEval-2026 Task 9, which focuses on detecting online polarization of multilingual, multicultural, and multievent (Naseem et al., 2025). The task is divided into three subtasks: (1) binary polarization detection, (2) multi-label classification of polarization type (e.g., political, racial, religious), and (3) multilabel identification of its manifestation (e.g., stereotype, vilification, dehumanization). For each subtask, we employ fine tune BERT-based transformer models. Model configurations are described in Section 4. The results are evaluated using the F1 macro score. We have achieved scores of 78.6, 55.8, 44.6 on the developmenttest set for subtasks 1, 2, and 3, respectively. Overall, the results demonstrate the effectiveness of BERT-based models for multilingual polarization detection.

pdf bib abs

HABIBTAZ at SemEval-2026 Task 11: Disentangling Formal Logic from Content via Synthetic Training and Multi-Objective Optimization
Abdullah Shaikh | Zain Naqi | Taha Zahid | Sandesh Kumar | Abdul Samad
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

While Large Language Models (LLMs) excel in many general NLP tasks, their formal reasoning capabilities are often compromised by content effects, demonstrating a measurable bias towards real-world plausibility. In this paper, we present our system for SemEval-2026 Task 11, which evaluates the ability of models to disentangle formal logic from content across 12 languages with and without distractor premises. We address this challenge using mDeBERTa-v3 networks fine-tuned on a synthetic, rule-based dataset of syllogistic schemes to avoid the semantic noise of LLM-augmented data. To explicitly decouple plausibility from logical structure, our training pipeline employs a multi-objective loss function combining Adaptive Group Distributionally Robust Optimization (DRO), a scheduled differentiable bias penalty, and KL-Divergence consistency regularization. Our system achieved #1 ranks and perfect Ranking Scores (100.0) with 0.00% bias and 100.0% accuracy on Subtask 1 (English), Subtask 2 (Noisy English), and Subtask 3 (Multilingual). On the highly complex Subtask 4 (Noisy Multilingual), the system achieved the 6th rank with 89.06% Accuracy and F1-score, alongside a limited 2.89% Bias and a 37.78 Ranking Score. Our dataset generation engine and codebase are publicly available to facilitate future work on robust logical reasoning.

pdf bib abs

Corpora Generation for Urdu Grammatical Error Correction
Syed Ahad | Burhanuddin Aliasghar Ezzi | Muhammad Arsalan Hussain | Sandesh Kumar | Abdul Samad
Findings of the Association for Computational Linguistics: ACL 2026

Grammatical Error Correction (GEC) for Urdu remains an under-researched area due to the lack of annotated datasets. This paper addresses the challenge of generating a robust corpus for fine-tuning deep learning models aimed at Urdu GEC. We propose a method for synthesizing a large dataset by collecting errors from the Urdu WikiEdits history, learning from them, and inserting similar errors in grammatically correct sentences to generate incorrect sentences with grammatical errors, hence creating a pair of grammatically correct and incorrect sentences. We introduce UrduGEC-Synthetic, a synthetically generated dataset produced through this pipeline. Furthermore, we introduce UrduGEC-Gold, a Gold Dataset by extracting errors from exam copies of students. Finally, we also fine-tuned various models on UrduGEC-Synthetic and evaluated them against UrduGEC-Gold to show the quality of synthetic data generation.

2025

pdf bib abs

Habib University at SemEval-2025 Task 11: Bridging the Gap in Text-Based Emotion Detection
Owais Waheed | Hammad Sajid | Kushal Chandani | Muhammad Areeb Kazmi | Sandesh Kumar | Abdul Samad
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

Emotion detection in text has emerged as a pivotal challenge in Natural Language Processing (NLP), particularly in multilingual and cross-lingual contexts. This paper presents our participation in SemEval 2025 Task 11, focusing on three subtasks: Multi-label Emotion Detection, Emotion Intensity Prediction, and Cross-lingual Emotion Detection. Leveraging state-of-the-art transformer models such as BERT and XLM-RoBERTa, we implemented baseline models and ensemble techniques to enhance predictive accuracy. Additionally, innovative approaches like data augmentation and translation-based cross-lingual emotion detection were used to address linguistic and class imbalances. Our results demonstrated significant improvements in F1 scores and Pearson correlations, showcasing the effectiveness of ensemble learning and transformer-based architectures in emotion recognition. This work advances the field by providing robust methods for emotion detection, particularly in low-resource and multilingual settings.

pdf bib abs

NarrativeMiners at SemEval-2025 Task 10: Combating Manipulative Narratives in Online News
Muhammad Khubaib | Muhammad Shoaib Khursheed | Muminah Khurram | Abdul Samad | Sandesh Kumar
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

Our team, Narrative Miners, participated in SemEval-2025 Task 10 to tackle the challenge of detecting manipulative narratives in online news, focusing on the Ukraine-Russia war and climate change. We worked on three key subtasks: classifying entity roles, categorizing narratives and subnarratives, and generating concise narrative explanations. Using transformer-based models like BART, BERT, GPT-2, and Flan-T5, we implemented a structured pipeline and applied data augmentation to enhance performance. BART-CNN proved to be our best-performing model, significantly improving classification accuracy and explanation generation. Despite challenges like dataset limitations and class imbalance, our approach demonstrated the effectiveness of hierarchical classification and multilingual analysis in combating online disinformation. We made use of different data augmentation techniques to cover the class imbalances present in the dataset. We had different evaluation metrics set for each subtask, specifically focusing on the need of that particular outcome. With this paper, we hope to play our part in mitigating the impact of harmful disinformation.

pdf bib abs

HU at SemEval-2025 Task 9: Leveraging LLM-Based Data Augmentation for Class Imbalance
Muhammad Saad | Meesum Abbas | Sandesh Kumar | Abdul Samad
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

This paper presents a solution to the food hazard detection challenge in the SemEval-2025 Task 9, focusing on overcoming class imbalance using data augmentation techniques. We employ large language models (LLMs) like GPT-4o, Gemini Flash 1.5, and T5 to generate synthetic data, alongside other methods like synonym replacement, back-translation, and paraphrasing. These augmented datasets are used to fine-tune transformer-based models such as DistilBERT, improving their performance in detecting food hazards and categorizing products. Our approach achieves notable improvements in macro-F1 scores for both subtasks, although challenges remain in detecting implicit hazards and handling extreme class imbalance. The paper also discusses various techniques, including class weighting and ensemble modeling, as part of the training process. Despite the improvements, further work is necessary to refine hazard detection, particularly for rare and implicit categories.

Sandesh Kumar

2026

2025

Co-authors

Venues