This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
AnastasiiaDemidova
Fixing paper assignments
Please select all papers that do not belong to this person.
Indicate below which author they should be assigned to.
The emotion recognition task has become increasingly popular as it has a wide range of applications in many fields, such as mental health, product management, and population mood state monitoring. SemEval 2025 Task 11 Track A framed the emotion recognition problem as a multi-label classification task. This paper presents our proposed system submissions in the following languages: English, Algerian and Moroccan Arabic, Brazilian and Mozambican Portuguese, German, Spanish, Nigerian-Pidgin, Russian, and Swedish. Here, we compare the emotion-detecting abilities of generative and discriminative pre-trained language models, exploring multiple approaches, including curriculum learning, in-context learning, and instruction and few-shot fine-tuning. We also propose an extended architecture method with a feature fusion technique enriched with emotion scores and a self-attention mechanism. We find that BERT-based models fine-tuned on data of a corresponding language achieve the best results across multiple languages for multi-label text-based emotion classification, outperforming both baseline and generative models.
Large language models (LLMs) play a crucial role in a wide range of real world applications. However, concerns about their safety and ethical implications are growing. While research on LLM safety is expanding, there is a noticeable gap in evaluating safety across multiple languages, especially in Arabic and Russian. We address this gap by exploring biases in LLMs across different languages and contexts, focusing on GPT-3.5 and Gemini. Through carefully designed argument-based prompts and scenarios in Arabic, English, and Russian, we examine biases in cultural, political, racial, religious, and gender domains. Our findings reveal biases in these domains. In particular, our investigation uncovers subtle biases where each model tends to present winners as those speaking the primary language the model is prompted with. Our study contributes to ongoing efforts to ensure justice and equality in LLM development and emphasizes the importance of further research towards responsible progress in this field.
Navigating the intricacies of machine translation (MT) involves tackling the nuanced disparities between Arabic dialects and Modern Standard Arabic (MSA), presenting a formidable obstacle. In this study, we delve into Subtask 3 of the NADI shared task (CITATION), focusing on the translation of sentences from four distinct Arabic dialects into MSA. Our investigation explores the efficacy of various models, including Jais, NLLB, GPT-3.5, and GPT-4, in this dialect-to-MSA translation endeavor. Our findings reveal that Jais surpasses all other models, boasting an average BLEU score of 19.48 in the combination of zero- and few-shot setting, whereas NLLB exhibits the least favorable performance, garnering a BLEU score of 8.77.