Computational Narrative Understanding for Expressive Text-to-Speech

Gaspard Michel, Elena V. Epure, Christophe Cerisara


Abstract
Recent advances in text-to-speech (TTS) have been driven by large, multi-domain speech corpora, yet the expressive potential of audiobook data remains underexamined. We argue that human-narrated audiobooks, particularly fictional works, contain rich and diverse prosodic cues arising from the natural alternation between neutral narration and expressive character dialogue. Building from this observation, we introduce LibriQuote, a large-scale 5.3K hours of expressive speech drawn from character quotations.Each quote is supplemented with contextual pseudo-labels for speech verbs and adverbs that characterize the intended delivery of direct speech (e.g., “he whispered softly”).We found that fine-tuning a flow-matching model on LibriQuote yields substantial improvements in expressivity and intelligibility, while training from scratch enhances expressiveness of an autoregressive TTS model.Benchmarking on LibriQuote-test highlights significant variability across systems in generating expressive speech.We publicly release the dataset, code, and evaluation resources to facilitate reproducibility.Audio samples can be found at https://libriquote.github.io/.
Anthology ID:
2026.findings-acl.308
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6194–6215
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.308/
DOI:
Bibkey:
Cite (ACL):
Gaspard Michel, Elena V. Epure, and Christophe Cerisara. 2026. Computational Narrative Understanding for Expressive Text-to-Speech. In Findings of the Association for Computational Linguistics: ACL 2026, pages 6194–6215, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Computational Narrative Understanding for Expressive Text-to-Speech (Michel et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.308.pdf
Checklist:
 2026.findings-acl.308.checklist.pdf