Hard Emotion Test Evaluation Sets for Language Models

Tiberiu Sosea; Cornelia Caragea

Hard Emotion Test Evaluation Sets for Language Models

Abstract

Language models perform well on emotion datasets but it remains unclear whether these models indeed understand emotions expressed in text or simply exploit supperficial lexical cues (e.g., emotion words). In this paper, we present two novel test evaluation sets sourced from two existing datasets that allow us to evaluate whether language models make real inferential decisions for emotion detection or not. Our human-annotated test sets are created by iteratively rephrasing input texts to gradually remove explicit emotion cues (while preserving the semantic similarity and the emotions) until a strong baseline BERT model yields incorrect predictions. Using our new test sets, we carry out a comprehensive analysis into the capabilities of small and large language models to predict emotions. Our analysis reveals that all models struggle to correctly predict emotions when emotion lexical cues become scarcer and scarcer, but large language models perform better than small pre-trained language models and push the performance by 14% over the 5% BERT baseline. We make our evaluation test sets and code publicly available.

Anthology ID:: 2025.findings-naacl.443
Volume:: Findings of the Association for Computational Linguistics: NAACL 2025
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 7930–7944
Language:
URL:: https://preview.aclanthology.org/Ingest-2025-COMPUTEL/2025.findings-naacl.443/
DOI:
Bibkey:
Cite (ACL):: Tiberiu Sosea and Cornelia Caragea. 2025. Hard Emotion Test Evaluation Sets for Language Models. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 7930–7944, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: Hard Emotion Test Evaluation Sets for Language Models (Sosea & Caragea, Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/Ingest-2025-COMPUTEL/2025.findings-naacl.443.pdf

PDF Cite Search Fix data