Aftab Rafique
2026
ShahiEmotion: A Benchmark Dataset for Punjabi Shahmukhi Emotion Detection
Usman Nawaz | Muhammad Junaid Iqbal | Tahir Alyas | Muhammad Asaf | Shumayla Yaqoob | Usman Ahmed Raza | Muhammad Amin Nadim | Aftab Rafique | Faisal Rehman
Proceedings of the 1st Workshop on Multilinguality in the Era of Large Language Models (MeLLM 2026)
Usman Nawaz | Muhammad Junaid Iqbal | Tahir Alyas | Muhammad Asaf | Shumayla Yaqoob | Usman Ahmed Raza | Muhammad Amin Nadim | Aftab Rafique | Faisal Rehman
Proceedings of the 1st Workshop on Multilinguality in the Era of Large Language Models (MeLLM 2026)
Emotion detection is an important text classification task with applications in sentiment analysis, social media monitoring, human-computer interaction, and affective language understanding. However, Punjabi written in the Shahmukhi script remains severely under-resourced for emotion detection, with limited benchmark-style resources available for supervised evaluation. This paper introduces ShahiEmotion, a new Punjabi Shahmukhi emotion detection dataset containing 30379 sentence-level instances annotated with seven emotion categories: sadness, surprise, happiness, anger, neutral, fear, and disgust. The dataset is designed to support research in a low-resource setting characterized by script-specific challenges, lexical variation, and substantial class imbalance. We establish baseline results using several pretrained transformer-based models and formulate emotion detection as a sentence-level classification task. In particular, we fine-tune multilingual BERT, multilingual DistilBERT, XLM-RoBERTa, and Urdu RoBERTa under the same training and evaluation setting using standard cross-entropy loss. Experimental results show that XLM-RoBERTa provides the strongest overall performance among the compared models. The best model achieves 77.95% accuracy, 58.47% macro-F1, and 77.60% weighted-F1 on the test set. The dataset, evaluation protocol, and baseline results introduced in this work are intended to support future research on Punjabi Shahmukhi emotion analysis and low-resource NLP.