Sandesh Kumar


2025

pdf bib
HU at SemEval-2025 Task 9: Leveraging LLM-Based Data Augmentation for Class Imbalance
Muhammad Saad | Meesum Abbas | Sandesh Kumar | Abdul Samad
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

This paper presents a solution to the food hazard detection challenge in the SemEval-2025 Task 9, focusing on overcoming class imbalance using data augmentation techniques. We employ large language models (LLMs) like GPT-4o, Gemini Flash 1.5, and T5 to generate synthetic data, alongside other methods like synonym replacement, back-translation, and paraphrasing. These augmented datasets are used to fine-tune transformer-based models such as DistilBERT, improving their performance in detecting food hazards and categorizing products. Our approach achieves notable improvements in macro-F1 scores for both subtasks, although challenges remain in detecting implicit hazards and handling extreme class imbalance. The paper also discusses various techniques, including class weighting and ensemble modeling, as part of the training process. Despite the improvements, further work is necessary to refine hazard detection, particularly for rare and implicit categories.

pdf bib
NarrativeMiners at SemEval-2025 Task 10: Combating Manipulative Narratives in Online News
Muhammad Khubaib | Muhammad Shoaib Khursheed | Muminah Khurram | Abdul Samad | Sandesh Kumar
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

Our team, Narrative Miners, participated in SemEval-2025 Task 10 to tackle the challenge of detecting manipulative narratives in online news, focusing on the Ukraine-Russia war and climate change. We worked on three key subtasks: classifying entity roles, categorizing narratives and subnarratives, and generating concise narrative explanations. Using transformer-based models like BART, BERT, GPT-2, and Flan-T5, we implemented a structured pipeline and applied data augmentation to enhance performance. BART-CNN proved to be our best-performing model, significantly improving classification accuracy and explanation generation. Despite challenges like dataset limitations and class imbalance, our approach demonstrated the effectiveness of hierarchical classification and multilingual analysis in combating online disinformation. We made use of different data augmentation techniques to cover the class imbalances present in the dataset. We had different evaluation metrics set for each subtask, specifically focusing on the need of that particular outcome. With this paper, we hope to play our part in mitigating the impact of harmful disinformation.

pdf bib
Habib University at SemEval-2025 Task 11: Bridging the Gap in Text-Based Emotion Detection
Owais Waheed | Hammad Sajid | Kushal Chandani | Muhammad Areeb Kazmi | Sandesh Kumar | Abdul Samad
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

Emotion detection in text has emerged as a pivotal challenge in Natural Language Processing (NLP), particularly in multilingual and cross-lingual contexts. This paper presents our participation in SemEval 2025 Task 11, focusing on three subtasks: Multi-label Emotion Detection, Emotion Intensity Prediction, and Cross-lingual Emotion Detection. Leveraging state-of-the-art transformer models such as BERT and XLM-RoBERTa, we implemented baseline models and ensemble techniques to enhance predictive accuracy. Additionally, innovative approaches like data augmentation and translation-based cross-lingual emotion detection were used to address linguistic and class imbalances. Our results demonstrated significant improvements in F1 scores and Pearson correlations, showcasing the effectiveness of ensemble learning and transformer-based architectures in emotion recognition. This work advances the field by providing robust methods for emotion detection, particularly in low-resource and multilingual settings.