Daniel Saeedi


2025

Emotion detection in multilingual settings presents significant challenges, particularly for low-resource languages where labeled datasets are scarce. To address these limitations, we introduce EmoRationale, a Retrieval-Augmented Generation (RAG) framework designed to enhance explainability and cross-lingual generalization in emotion detection. Our approach combines vector-based retrieval with in-context learning in large language models (LLMs), using semantically relevant examples to enhance classification accuracy and interpretability. Unlike traditional fine-tuning methods, our system provides evidence-based reasoning for its predictions, making emotion detection more transparent and adaptable across diverse linguistic contexts. Experimental results on the SemEval-2025 Task 11 dataset demonstrate that our RAG-based method achieves strong performance in multi-label emotion classification, emotion intensity assessment, and cross-lingual emotion transfer, surpassing conventional models in interpretability while remaining cost-effective.

2022

This paper presents a combination of data augmentation methods to boost the performance of state-of-the-art transformer-based language models for Patronizing and Condescending Language (PCL) detection and multi-label PCL classification tasks. These tasks are inherently different from sentiment analysis because positive/negative hidden attitudes in the context will not necessarily be considered positive/negative for PCL tasks. The oblation study observes that the imbalance degree of PCL dataset is in the extreme range. This paper presents a modified version of the sentence paraphrasing deep learning model (PEGASUS) to tackle the limitation of maximum sequence length. The proposed algorithm has no specific maximum input length to paraphrase sequences. Our augmented underrepresented class of annotated data achieved competitive results among top-16 SemEval-2022 participants. This paper’s approaches rely on fine-tuning pretrained RoBERTa and GPT3 models such as Davinci and Curie engines with an extra-enriched PCL dataset. Furthermore, we discuss Few-Shot learning technique to overcome the limitation of low-resource NLP problems.