STORiCo: Storytelling TTS for Hindi with Character Voice Modulation

Pavan Tankala, Preethi Jyothi, Preeti Rao, Pushpak Bhattacharyya


Abstract
We present a new Hindi text-to-speech (TTS) dataset and demonstrate its utility for the expressive synthesis of children’s audio stories. The dataset comprises narration by a single female speaker who modifies her voice to produce different story characters. Annotation for dialogue identification, character labelling, and character attribution are provided, all of which are expected to facilitate the learning of character voice and speaking styles. Experiments are conducted using different versions of the annotated dataset that enable training a multi-speaker TTS model on the single-speaker data. Subjective tests show that the multi-speaker model improves expressiveness and character voice consistency compared to the baseline single-speaker TTS. With the multi-speaker model, objective evaluations show comparable word error rates, better speaker voice consistency, and higher correlations with ground-truth emotion attributes. We release a new 16.8 hours storytelling speech dataset in Hindi and propose effective solutions for expressive TTS with narrator voice modulation and character voice consistency.
Anthology ID:
2024.eacl-short.37
Volume:
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:
March
Year:
2024
Address:
St. Julian’s, Malta
Editors:
Yvette Graham, Matthew Purver
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
426–431
Language:
URL:
https://aclanthology.org/2024.eacl-short.37
DOI:
Bibkey:
Cite (ACL):
Pavan Tankala, Preethi Jyothi, Preeti Rao, and Pushpak Bhattacharyya. 2024. STORiCo: Storytelling TTS for Hindi with Character Voice Modulation. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers), pages 426–431, St. Julian’s, Malta. Association for Computational Linguistics.
Cite (Informal):
STORiCo: Storytelling TTS for Hindi with Character Voice Modulation (Tankala et al., EACL 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2024.eacl-short.37.pdf
Video:
 https://preview.aclanthology.org/nschneid-patch-4/2024.eacl-short.37.mp4