Daniel Saeedi
2025
GT-NLP at SemEval-2025 Task 11: EmoRationale, Evidence-Based Emotion Detection via Retrieval-Augmented Generation
Daniel Saeedi
|
Alireza Kheirandish
|
Sirwe Saeedi
|
Hossein Sahour
|
Aliakbar Panahi
|
Iman Naeeni
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
Emotion detection in multilingual settings presents significant challenges, particularly for low-resource languages where labeled datasets are scarce. To address these limitations, we introduce EmoRationale, a Retrieval-Augmented Generation (RAG) framework designed to enhance explainability and cross-lingual generalization in emotion detection. Our approach combines vector-based retrieval with in-context learning in large language models (LLMs), using semantically relevant examples to enhance classification accuracy and interpretability. Unlike traditional fine-tuning methods, our system provides evidence-based reasoning for its predictions, making emotion detection more transparent and adaptable across diverse linguistic contexts. Experimental results on the SemEval-2025 Task 11 dataset demonstrate that our RAG-based method achieves strong performance in multi-label emotion classification, emotion intensity assessment, and cross-lingual emotion transfer, surpassing conventional models in interpretability while remaining cost-effective.
2022
CS/NLP at SemEval-2022 Task 4: Effective Data Augmentation Methods for Patronizing Language Detection and Multi-label Classification with RoBERTa and GPT3
Daniel Saeedi
|
Sirwe Saeedi
|
Aliakbar Panahi
|
Alvis C.M. Fong
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
This paper presents a combination of data augmentation methods to boost the performance of state-of-the-art transformer-based language models for Patronizing and Condescending Language (PCL) detection and multi-label PCL classification tasks. These tasks are inherently different from sentiment analysis because positive/negative hidden attitudes in the context will not necessarily be considered positive/negative for PCL tasks. The oblation study observes that the imbalance degree of PCL dataset is in the extreme range. This paper presents a modified version of the sentence paraphrasing deep learning model (PEGASUS) to tackle the limitation of maximum sequence length. The proposed algorithm has no specific maximum input length to paraphrase sequences. Our augmented underrepresented class of annotated data achieved competitive results among top-16 SemEval-2022 participants. This paper’s approaches rely on fine-tuning pretrained RoBERTa and GPT3 models such as Davinci and Curie engines with an extra-enriched PCL dataset. Furthermore, we discuss Few-Shot learning technique to overcome the limitation of low-resource NLP problems.
Search
Fix author
Co-authors
- Aliakbar Panahi 2
- Sirwe Saeedi 2
- Alvis C.M. Fong 1
- Alireza Kheirandish 1
- Iman Naeeni 1
- show all...