SEER: The Span-based Emotion Evidence Retrieval Benchmark

Aneesha Sampath; Oya Aran; Emily Mower Provost

SEER: The Span-based Emotion Evidence Retrieval Benchmark

Aneesha Sampath, Oya Aran, Emily Mower Provost

Abstract

Emotion recognition methods typically assign labels at the sentence level, obscuring the specific linguistic cues that signal emotion. This limits their utility in applications requiring targeted responses, such as empathetic dialogue and clinical support, which depend on knowing which language expresses emotion. The task of identifying emotion evidence – text spans conveying emotion – remains underexplored due to a lack of labeled data. Without span-level annotations, we cannot evaluate whether models truly localize emotion expression, nor can we diagnose the sources of emotion misclassification. We introduce the SEER (Span-based Emotion Evidence Retrieval) Benchmark to evaluate Large Language Models (LLMs) on this task. SEER evaluates single and multi-sentence span identification with new annotations on 1200 real-world sentences. We evaluate 14 LLMs and find that, on single-sentence inputs, the strongest models match the performance of average human annotators, but performance declines in multi-sentence contexts. Key failure modes include fixation on emotion keywords and false positives in neutral text.

Anthology ID:: 2025.findings-ijcnlp.76
Volume:: Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
Month:: December
Year:: 2025
Address:: Mumbai, India
Editors:: Kentaro Inui, Sakriani Sakti, Haofen Wang, Derek F. Wong, Pushpak Bhattacharyya, Biplab Banerjee, Asif Ekbal, Tanmoy Chakraborty, Dhirendra Pratap Singh
Venue:: Findings
SIG:
Publisher:: The Asian Federation of Natural Language Processing and The Association for Computational Linguistics
Note:
Pages:: 1248–1267
Language:
URL:: https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.findings-ijcnlp.76/
DOI:
Bibkey:
Cite (ACL):: Aneesha Sampath, Oya Aran, and Emily Mower Provost. 2025. SEER: The Span-based Emotion Evidence Retrieval Benchmark. In Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, pages 1248–1267, Mumbai, India. The Asian Federation of Natural Language Processing and The Association for Computational Linguistics.
Cite (Informal):: SEER: The Span-based Emotion Evidence Retrieval Benchmark (Sampath et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.findings-ijcnlp.76.pdf

PDF Cite Search Fix data