Investigating Inter- and Intra-speaker Voice Conversion using Audiobooks

Aghilas Sini, Damien Lolive, Nelly Barbot, Pierre Alain


Abstract
Audiobook readers play with their voices to emphasize some text passages, highlight discourse changes or significant events, or in order to make listening easier and entertaining. A dialog is a central passage in audiobooks where the reader applies significant voice transformation, mainly prosodic modifications, to realize character properties and changes. However, these intra-speaker modifications are hard to reproduce with simple text-to-speech synthesis. The manner of vocalizing characters involved in a given story depends on the text style and differs from one speaker to another. In this work, this problem is investigated through the prism of voice conversion. We propose to explore modifying the narrator’s voice to fit the context of the story, such as the character who is speaking, using voice conversion. To this end, two complementary experiments are designed: the first one aims to assess the quality of our Phonetic PosteriorGrams (PPG)-based voice conversion system using parallel data. Subjective evaluations with naive raters are conducted to estimate the quality of the signal generated and the speaker similarity. The second experiment applies an intra-speaker voice conversion, considering narration passages and direct speech passages as two distinct speakers. Data are then nonparallel and the dissimilarity between character and narrator is subjectively measured.
Anthology ID:
2022.lrec-1.794
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
7305–7313
Language:
URL:
https://aclanthology.org/2022.lrec-1.794
DOI:
Bibkey:
Cite (ACL):
Aghilas Sini, Damien Lolive, Nelly Barbot, and Pierre Alain. 2022. Investigating Inter- and Intra-speaker Voice Conversion using Audiobooks. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 7305–7313, Marseille, France. European Language Resources Association.
Cite (Informal):
Investigating Inter- and Intra-speaker Voice Conversion using Audiobooks (Sini et al., LREC 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-2024-clasp/2022.lrec-1.794.pdf