Robert Budac


2025

pdf bib
CR4-NarrEmote: An Open Vocabulary Dataset of Narrative Emotions Derived Using Citizen Science
Andrew Piper | Robert Budac
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

We introduce “Citizen Readers for Narrative Emotions” (CR4-NarrEmote), a large-scale, open-vocabulary dataset of narrative emotions derived through a citizen science initiative. Over a four-month period, 3,738 volunteers contributed more than 200,000 emotion annotations across 43,000 passages from long-form fiction and non-fiction, spanning 150 years, twelve genres, and multiple Anglophone cultural contexts. To facilitate model training and comparability, we provide mappings to both dimensional (Valence-Arousal-Dominance) and categorical (NRC Emotion) frameworks. We evaluate annotation reliability using lexical, categorical, and semantic agreement measures, and find substantial alignment between citizen science annotations and expert-generated labels. As the first open-vocabulary resource focused on narrative emotions at scale, CR4-NarrEmote provides an important foundation for affective computing and narrative understanding.