The Correlation Between Emotion in Text and Speech Segments is Limited: A Cross-Modal Study

David Lindevelt; Suzan Verberne; Joost Broekens

The Correlation Between Emotion in Text and Speech Segments is Limited: A Cross-Modal Study

David Lindevelt, Suzan Verberne, Joost Broekens

Abstract

Although expressive TTS systems aim to capture human-like emotion, little is known about how well emotional signals in text correspond to those in speech. In this short paper, we investigate how emotion (Valence, Arousal, Dominance) in text relates to emotion in speech. We use 8 large language models for identifying emotion in text and two audio models for emotion in speech, across three genres: Podcasts, Audiobooks and TED talks. Findings show that while language models perform well on emotion recognition from situational text, and the audio models perform well on speech, they show a strong correlation for Valence only. Further, the genre of the content significantly impacts the correlation: audiobooks exhibit higher text-audio correlation than TED talks. Finally, we show that more context for LLMs fails to improve this correlation between text and speech emotion prediction. Our results highlight that emotional signals in text do not correspond well to those in speech: emotion prediction from text alone is insufficient for emotional TTS.

Anthology ID:: 2026.findings-eacl.136
Volume:: Findings of the Association for Computational Linguistics: EACL 2026
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2611–2621
Language:
URL:: https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.136/
DOI:
Bibkey:
Cite (ACL):: David Lindevelt, Suzan Verberne, and Joost Broekens. 2026. The Correlation Between Emotion in Text and Speech Segments is Limited: A Cross-Modal Study. In Findings of the Association for Computational Linguistics: EACL 2026, pages 2611–2621, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: The Correlation Between Emotion in Text and Speech Segments is Limited: A Cross-Modal Study (Lindevelt et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.136.pdf
Checklist:: 2026.findings-eacl.136.checklist.pdf

PDF Cite Search Checklist Fix data