Slur and Emoji Aware Models for Hate and Sentiment Detection in Roman Urdu Transgender Discourse

Muhammad Owais Raza, Aqsa Umar, Mehrub Awan


Abstract
The rise of social media has amplified both the visibility and vulnerability of marginalized communities, particularly the transgender population in South Asia. While hate speech detection has seen considerable progress in high resource languages like English, under-resourced and code mixed languages such as Roman Urdu remains significantly understudied. This paper presents a novel Roman Urdu dataset derived from Instagram comments on transgender related content, capturing the intricacies of multilingual, code-mixed, and emoji-laden social discourse. We introduce a transphobic slur lexicon specific to Roman Urdu and a semantic emoji taxonomy grounded in contextual usage. These resources are utilized to perform fine-grained classification of sentiment and hate speech using both traditional machine learning models and transformer-based architectures. The findings show that our custom-trained BERT-based models, Senti-RU-Bert and Hate-RU-Bert, best performance, with F1 scores of 80.39% for sentiment classification and 77.34% for hate speech classification. Ablation studies reveal consistent performance gains when slur and emoji features are included.
Anthology ID:
2025.lowresnlp-1.14
Volume:
Proceedings of the First Workshop on Advancing NLP for Low-Resource Languages
Month:
September
Year:
2025
Address:
Varna, Bulgaria
Editors:
Ernesto Luis Estevanell-Valladares, Alicia Picazo-Izquierdo, Tharindu Ranasinghe, Besik Mikaberidze, Simon Ostermann, Daniil Gurgurov, Philipp Mueller, Claudia Borg, Marián Šimko
Venues:
LowResNLP | WS
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
131–139
Language:
URL:
https://preview.aclanthology.org/corrections-2026-01/2025.lowresnlp-1.14/
DOI:
Bibkey:
Cite (ACL):
Muhammad Owais Raza, Aqsa Umar, and Mehrub Awan. 2025. Slur and Emoji Aware Models for Hate and Sentiment Detection in Roman Urdu Transgender Discourse. In Proceedings of the First Workshop on Advancing NLP for Low-Resource Languages, pages 131–139, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
Slur and Emoji Aware Models for Hate and Sentiment Detection in Roman Urdu Transgender Discourse (Raza et al., LowResNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/corrections-2026-01/2025.lowresnlp-1.14.pdf