2022
pdf
abs
Investigating Inter- and Intra-speaker Voice Conversion using Audiobooks
Aghilas Sini
|
Damien Lolive
|
Nelly Barbot
|
Pierre Alain
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Audiobook readers play with their voices to emphasize some text passages, highlight discourse changes or significant events, or in order to make listening easier and entertaining. A dialog is a central passage in audiobooks where the reader applies significant voice transformation, mainly prosodic modifications, to realize character properties and changes. However, these intra-speaker modifications are hard to reproduce with simple text-to-speech synthesis. The manner of vocalizing characters involved in a given story depends on the text style and differs from one speaker to another. In this work, this problem is investigated through the prism of voice conversion. We propose to explore modifying the narrator’s voice to fit the context of the story, such as the character who is speaking, using voice conversion. To this end, two complementary experiments are designed: the first one aims to assess the quality of our Phonetic PosteriorGrams (PPG)-based voice conversion system using parallel data. Subjective evaluations with naive raters are conducted to estimate the quality of the signal generated and the speaker similarity. The second experiment applies an intra-speaker voice conversion, considering narration passages and direct speech passages as two distinct speakers. Data are then nonparallel and the dissimilarity between character and narrator is subjectively measured.
pdf
Techniques de synthèse vocale neuronale à l’épreuve des données d’apprentissage non dédiées : les livres audio amateurs en français [Neural speech synthesis techniques put to the test with non-dedicated training data: amateur French audio books]
Aghilas Sini
|
Lily Wadoux
|
Antoine Perquin
|
Gaëlle Vidal
|
David Guennec
|
Damien Lolive
|
Pierre Alain
|
Nelly Barbot
|
Jonathan Chevelu
|
Arnaud Delhay
Traitement Automatique des Langues, Volume 63, Numéro 2 : Traitement automatique des langues intermodal et multimodal [Cross-modal and multimodal natural language processing]
2015
pdf
bib
Large Linguistic Corpus Reduction with SCP Algorithms
Nelly Barbot
|
Olivier Boëffard
|
Jonathan Chevelu
|
Arnaud Delhay
Computational Linguistics, Volume 41, Issue 3 - September 2015
2012
pdf
Évaluation segmentale du système de synthèse HTS pour le français (Segmental evaluation of HTS) [in French]
Sébastien Le Maguer
|
Nelly Barbot
|
Olivier Boeffard
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 1: JEP
pdf
abs
Comparing performance of different set-covering strategies for linguistic content optimization in speech corpora
Nelly Barbot
|
Olivier Boeffard
|
Arnaud Delhay
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Set covering algorithms are efficient tools for solving an optimal linguistic corpus reduction. The optimality of such a process is directly related to the descriptive features of the sentences of a reference corpus. This article suggests to verify experimentally the behaviour of three algorithms, a greedy approach and a lagrangian relaxation based one giving importance to rare events and a third one considering the Kullback-Liebler divergence between a reference and the ongoing distribution of events. The analysis of the content of the reduced corpora shows that the both first approaches stay the most effective to compress a corpus while guaranteeing a minimal content. The variant which minimises the Kullback-Liebler divergence guarantees a distribution of events close to a reference distribution as expected; however, the price for this solution is a much more important corpus. In the proposed experiments, we have also evaluated a mixed-approach considering a random complement to the smallest coverings.
2008
pdf
abs
Comparing Set-Covering Strategies for Optimal Corpus Design
Jonathan Chevelu
|
Nelly Barbot
|
Olivier Boeffard
|
Arnaud Delhay
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
This article is interested in the problem of the linguistic content of a speech corpus. Depending on the target task, the phonological and linguistic content of the corpus is controlled by collecting a set of sentences which covers a preset description of phonological attributes under the constraint of an overall duration as small as possible. This goal is classically achieved by greedy algorithms which however do not guarantee the optimality of the desired cover. In recent works, a lagrangian-based algorithm, called LamSCP, has been used to extract coverings of diphonemes from a large corpus in French, giving better results than a greedy algorithm. We propose to keep comparing both algorithms in terms of the shortest duration, stability and robustness by achieving multi-represented diphoneme or triphoneme covering. These coverings correspond to very large scale optimization problems, from a corpus in English. For each experiment, LamSCP improves the greedy results from 3.9 to 9.7 percent.