David Doukhan


A Semi-Automatic Approach to Create Large Gender- and Age-Balanced Speaker Corpora: Usefulness of Speaker Diarization & Identification.
Rémi Uro | David Doukhan | Albert Rilliard | Laetitia Larcher | Anissa-Claire Adgharouamane | Marie Tahon | Antoine Laurent
Proceedings of the Thirteenth Language Resources and Evaluation Conference

This paper presents a semi-automatic approach to create a diachronic corpus of voices balanced for speaker’s age, gender, and recording period, according to 32 categories (2 genders, 4 age ranges and 4 recording periods). Corpora were selected at French National Institute of Audiovisual (INA) to obtain at least 30 speakers per category (a total of 960 speakers; only 874 have be found yet). For each speaker, speech excerpts were extracted from audiovisual documents using an automatic pipeline consisting of speech detection, background music and overlapped speech removal and speaker diarization, used to present clean speaker segments to human annotators identifying target speakers. This pipeline proved highly effective, cutting down manual processing by a factor of ten. Evaluation of the quality of the automatic processing and of the final output is provided. It shows the automatic processing compare to up-to-date process, and that the output provides high quality speech for most of the selected excerpts. This method is thus recommendable for creating large corpora of known target speakers.


Computer-assisted Speaker Diarization: How to Evaluate Human Corrections
Pierre-Alexandre Broux | David Doukhan | Simon Petitrenaud | Sylvain Meignier | Jean Carrive
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)


Designing French Tale Corpora for Entertaining Text To Speech Synthesis
David Doukhan | Sophie Rosset | Albert Rilliard | Christophe d’Alessandro | Martine Adda-Decker
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Text and speech corpora for training a tale telling robot have been designed, recorded and annotated. The aim of these corpora is to study expressive storytelling behaviour, and to help in designing expressive prosodic and co-verbal variations for the artificial storyteller). A set of 89 children tales in French serves as a basis for this work. The tales annotation principles and scheme are described, together with the corpus description in terms of coverage and inter-annotator agreement. Automatic analysis of a new tale with the help of this corpus and machine learning is discussed. Metrics for evaluation of automatic annotation methods are discussed. A speech corpus of about 1 hour, with 12 tales has been recorded and aligned and annotated. This corpus is used for predicting expressive prosody in children tales, above the level of the sentence.