Junaid Rashid

2026

UPR at SemEval-2026 Task 9: Multi-Label Classification of Polarization Across Social Dimensions and Manifestation Identification in Urdu
Mtayyaba Shahzad | Inzmam Khadam | Zaufishan Mahmood | Junaid Rashid | Shamaila Hayat | Fakhar Ayub
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

The analysis of polarized content on social networks is crucial for understanding public discourse; however, research on low-resource languages such as Urdu remains limited. In this work, we address two complementary subtasks of polarization analysis in Urdu social media text. First, we formulate polarization classification across multiple social dimensions as a multi-label task, including political, religious, racial/ethnic, gender/sexual, and other. We fine-tune XLM-RoBERTa for multi-label classification with language-specific preprocessing, duplicate filtering, and data augmentation to handle class imbalance. The proposed model achieves a Macro F1-score of 0.758 for social-dimension polarization classification.Second, we perform polarization manifestation identification, focusing on how polarization is expressed in text through six manifestations: stereotype, vilification, dehumanization, extreme language, lack of empathy, and invalidation. Using the same transformer-based framework with imbalance-aware training, our system achieves a Macro F1-score of 0.72 on the official test set. These results demonstrate the effectiveness of multilingual transformer models for multi-dimensional polarization analysis in low-resource Urdu text.

pdf bib abs

UPR at SemEval-2026 Task 9: Polarization Detection in Urdu with Language-Specific Transformer and Data Augmentation
Alishba Wazir | Muhammad Asad Khan | Junaid Rashid | Shamaila Hayat | Samira Kanwal
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

This paper addresses polarization detection in Urdu, a low-resource language characterized by complex morphology and insufficient annotated data. We formulate the task as a binary classification problem of social media posts into polarized and non-polarized categories. Our approach is based on Urdu-BERT, a language-specific transformer model combined with language-specific preprocessing, duplicate removal, and data augmentation to mitigate class imbalance and improve generalization. Experimental results show that the fine-tuned Urdu-BERT outperforms TF-IDF-based lexical machine learning baselines and achieves strong performance relative to multilingual transformer baselines. The findings indicate that language-specific pretrained transformers, when combined with appropriate preprocessing and augmentation strategies, provide an effective and generalizable framework for low-resource Urdu polarization detection.

2023

pdf bib

Temporal Tides of Emotional Resonance: A Novel Approach to Identify Mental Health on Social Media
Usman Naseem | Surendrabikram Thapa | Qi Zhang | Junaid Rashid | Liang Hu | Mehwish Nasim
Proceedings of the 11th International Workshop on Natural Language Processing for Social Media

2022

pdf bib abs

Incorporating Medical Knowledge to Transformer-based Language Models for Medical Dialogue Generation
Usman Naseem | Ajay Bandi | Shaina Raza | Junaid Rashid | Bharathi Raja Chakravarthi
Proceedings of the 21st Workshop on Biomedical Language Processing

Medical dialogue systems have the potential to assist doctors in expanding access to medical care, improving the quality of patient experiences, and lowering medical expenses. The computational methods are still in their early stages and are not ready for widespread application despite their great potential. Existing transformer-based language models have shown promising results but lack domain-specific knowledge. However, to diagnose like doctors, an automatic medical diagnosis necessitates more stringent requirements for the rationality of the dialogue in the context of relevant knowledge. In this study, we propose a new method that addresses the challenges of medical dialogue generation by incorporating medical knowledge into transformer-based language models. We present a method that leverages an external medical knowledge graph and injects triples as domain knowledge into the utterances. Automatic and human evaluation on a publicly available dataset demonstrates that incorporating medical knowledge outperforms several state-of-the-art baseline methods.

pdf bib

A DistilBERTopic Model for Short Text Documents
Junaid Rashid | Jungeun Kim | Usman Naseem | Amir Hussain
Proceedings of the 20th Annual Workshop of the Australasian Language Technology Association