2022
pdf
abs
Aesop’s fable “The North Wind and the Sun” Used as a Rosetta Stone to Extract and Map Spoken Words in Under-resourced Languages
Elena Knyazeva
|
Philippe Boula de Mareüil
|
Frédéric Vernier
Proceedings of the Thirteenth Language Resources and Evaluation Conference
This paper describes a method of semi-automatic word spotting in minority languages, from one and the same Aesop fable “The North Wind and the Sun” translated in Romance languages/dialects from Hexagonal (i.e. Metropolitan) France and languages from French Polynesia. The first task consisted of finding out how a dozen words such as “wind” and “sun” were translated in over 200 versions collected in the field — taking advantage of orthographic similarity, word position and context. Occurrences of the translations were then extracted from the phone-aligned recordings. The results were judged accurate in 96–97% of cases, both on the development corpus and a test set of unseen data. Corrected alignments were then mapped and basemaps were drawn to make various linguistic phenomena immediately visible. The paper exemplifies how regular expressions may be used for this purpose. The final result, which takes the form of an online speaking atlas (enriching the https://atlas.limsi.fr website), enables us to illustrate lexical, morphological or phonetic variation.
2020
pdf
abs
Neural Text-to-Speech Synthesis for an Under-Resourced Language in a Diglossic Environment: the Case of Gascon Occitan
Ander Corral
|
Igor Leturia
|
Aure Séguier
|
Michäel Barret
|
Benaset Dazéas
|
Philippe Boula de Mareüil
|
Nicolas Quint
Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL)
Occitan is a minority language spoken in Southern France, some Alpine Valleys of Italy, and the Val d’Aran in Spain, which only very recently started developing language and speech technologies. This paper describes the first project for designing a Text-to-Speech synthesis system for one of its main regional varieties, namely Gascon. We used a state-of-the-art deep neural network approach, the Tacotron2-WaveGlow system. However, we faced two additional difficulties or challenges: on the one hand, we wanted to test if it was possible to obtain good quality results with fewer recording hours than is usually reported for such systems; on the other hand, we needed to achieve a standard, non-Occitan pronunciation of French proper names, therefore we needed to record French words and test phoneme-based approaches. The evaluation carried out over the various developed systems and approaches shows promising results with near production-ready quality. It has also allowed us to detect the phenomena for which some flaws or fall of quality occur, pointing at the direction of future work to improve the quality of the actual system and for new systems for other language varieties and voices.
pdf
abs
Automatic Extraction of Verb Paradigms in Regional Languages: the case of the Linguistic Crescent varieties
Elena Knyazeva
|
Gilles Adda
|
Philippe Boula de Mareüil
|
Maximilien Guérin
|
Nicolas Quint
Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL)
Language documentation is crucial for endangered varieties all over the world. Verb conjugation is a key aspect of this documentation for Romance varieties such as those spoken in central France, in the area of the Linguistic Crescent, which extends overs significant portions of the old provinces of Marche and Bourbonnais. We present a first methodological experiment using automatic speech processing tools for the extraction of verbal paradigms collected and recorded during fieldworks sessions made in situ. In order to prove the feasibility of the approach, we test it with different protocols, on good quality data, and we offer possible ways of extension for this research.
2018
pdf
Crowdsourcing Regional Variation Data and Automatic Geolocalisation of Speakers of European French
Jean-Philippe Goldman
|
Yves Scherrer
|
Julie Glikman
|
Mathieu Avanzi
|
Christophe Benzitoun
|
Philippe Boula de Mareüil
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
pdf
A Speaking Atlas of the Regional Languages of France
Philippe Boula de Mareüil
|
Albert Rilliard
|
Frédéric Vernier
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
2016
pdf
abs
Cartopho : un site web de cartographie de variantes de prononciation en français (Cartopho: a website for mapping pronunciation variants in French)
Philippe Boula de Mareüil
|
Jean-Philippe Goldman
|
Albert Rilliard
|
Yves Scherrer
|
Frédéric Vernier
Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 1 : JEP
Le présent travail se propose de renouveler les traditionnels atlas dialectologiques pour cartographier les variantes de prononciation en français, à travers un site internet. La toile est utilisée non seulement pour collecter des données, mais encore pour disséminer les résultats auprès des chercheurs et du grand public. La méthodologie utilisée, à base de crowdsourcing (ou « production participative »), nous a permis de recueillir des informations auprès de 2500 francophones d’Europe (France, Belgique, Suisse). Une plateforme dynamique à l’interface conviviale a ensuite été développée pour cartographier la prononciation de 70 mots dans les différentes régions des pays concernés (des mots notamment à voyelle moyenne ou dont la consonne finale peut être prononcée ou non). Les options de visualisation par département/canton/province ou par région, combinant plusieurs traits de prononciation et ensembles de mots, sous forme de pastilles colorées, de hachures, etc. sont présentées dans cet article. On peut ainsi observer immédiatement un /E/ plus fermé (ainsi qu’un /O/ plus ouvert) dans le Nord-Pas-de-Calais et le sud de la France, pour des mots comme parfait ou rose, un /Œ/ plus fermé en Suisse pour un mot comme gueule, par exemple.
pdf
abs
Pics mélodiques prétoniques en portugais brésilien : une étude quantitative (Pre-stress pitch peaks in Brazilian Portuguese: a quantitative study)
Plínio Barbosa
|
Philippe Boula de Mareüil
Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 1 : JEP
Le présent travail porte sur un trait prosodique assez typique du portugais brésilien : un pic mélodique en position prétonique en fin d’énoncé déclaratif. Il vise à quantifier le phénomène, à partir d’enregistrements de cinq hommes et cinq femmes de l’état de São Paulo, en lecture et en narration. Il en résulte que des montées sur les prétoniques de 4 demi-tons suivies de descentes de 8 demi-tons, en moyenne, s’observent dans les deux styles de parole, chez les femmes. Chez les hommes, ces valeurs sont respectivement de 3 et 7 demi-tons. Ces montées-descentes d’une tierce et d’une quinte, respectivement, peuvent donner au portugais brésilien cette musicalité particulière et, puisque les descentes sont plus rapides chez les femmes, elles ouvrent des perspectives sociolinguistiques intéressantes.
2012
pdf
Questions corses : peut-on mettre en évidence un transfert prosodique du corse vers le français ? (Corsican questions: is there a prosodic transfer from Corsican to French?) [in French]
Philippe Boula de Mareüil
|
Albert Rilliard
|
Paolo Mairano
|
Jean-Pierre Lai
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 1: JEP
pdf
Allongements vocaliques en français de Belgique : approche expérimentale et perceptive (Vowel lengthening in Belgium French: an experimental and perceptual approach) [in French]
Alice Bardiaux
|
Philippe Boula de Mareüil
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 1: JEP
2008
pdf
abs
Annotation and analysis of overlapping speech in political interviews
Martine Adda-Decker
|
Claude Barras
|
Gilles Adda
|
Patrick Paroubek
|
Philippe Boula de Mareüil
|
Benoit Habert
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Looking for a better understanding of spontaneous speech-related phenomena and to improve automatic speech recognition (ASR), we present here a study on the relationship between the occurrence of overlapping speech segments and disfluencies (filled pauses, repetitions, revisions) in political interviews. First we present our data, and our overlap annotation scheme. We detail our choice of overlapping tags and our definition of disfluencies; the observed ratios of the different overlapping tags are examined, as well as their correlation with of the speaker role and propose two measures to characterise speakers interacting attitude: the attack/resist ratio and the attack density. We then study the relationship between the overlapping speech segments and the disfluencies in our corpus, before concluding on the perspectives that our experiments offer.
2006
pdf
abs
A joint intelligibility evaluation of French text-to-speech synthesis systems: the EvaSy SUS/ACR campaign
Philippe Boula de Mareüil
|
Christophe d’Alessandro
|
Alexander Raake
|
Gérard Bailly
|
Marie-Neige Garcia
|
Michel Morel
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
The EVALDA/EvaSy project is dedicated to the evaluation of text-to-speech synthesis systems for the French language. It is subdivided into four components: evaluation of the grapheme-to-phoneme conversion module (Boula de Mareüil et al., 2005), evaluation of prosody (Garcia et al., 2006), evaluation of intelligibility, and global evaluation of the quality of the synthesised speech. This paper reports on the key results of the intelligibility and global evaluation of the synthesised speech. It focuses on intelligibility, assessed on the basis of semantically unpredictable sentences, but a comparison with absolute category rating in terms of e.g. pleasantness and naturalness is also provided. Three diphone systems and three selection systems have been evaluated. It turns out that the most intelligible system (diphone-based) is far from being the one which obtains the best mean opinion score.
pdf
abs
A joint prosody evaluation of French text-to-speech synthesis systems
Marie-Neige Garcia
|
Christophe d’Alessandro
|
Gérard Bailly
|
Philippe Boula de Mareüil
|
Michel Morel
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
This paper reports on prosodic evaluation in the framework of the EVALDA/EvaSy project for text-to-speech (TTS) evaluation for the French language. Prosody is evaluated using a prosodic transplantation paradigm. Intonation contours generated by the synthesis systems are transplanted on a common segmental content. Both diphone based synthesis and natural speech are used. Five TTS systems are tested along with natural voice. The test is a paired preference test (with 19 subjects), using 7 sentences. The results indicate that natural speech obtains consistently the first rank (with an average preference rate of 80%), followed by a selection based system (72%) and a diphone based system (58%). However, rather large variations in judgements are observed among subjects and sentences, and in some cases synthetic speech is preferred to natural speech. These results show the remarkable improvement achieved by the best selection based synthesis systems in terms of prosody. In this way; a new paradigm for evaluation of the prosodic component of TTS systems has been successfully demonstrated.
2004
pdf
abs
Automatic Audio and Manual Transcripts Alignment, Time-code Transfer and Selection of Exact Transcripts
C. Barras
|
G. Adda
|
M. Adda-Decker
|
B. Habert
|
P. Boula de Mareüil
|
P. Paroubek
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)
The present study focuses on automatic processing of sibling resources of audio and written documents, such as available in audio archives or for parliament debates: written texts are close but not exact audio transcripts. Such resources deserve attention for several reasons: they represent an interesting testbed for studying differences between written and spoken material and they yield low cost resources for acoustic model training. When automatically transcribing the audio data, regions of agreement between automatic transcripts and written sources allow to transfer time-codes to the written documents: this may be helpful in an audio archive or audio information retrieval environment. Regions of disagreement can be automatically selected for further correction by human transcribers. This study makes use of 10 hours of French radio interview archives with corresponding press-oriented transcripts. The audio corpus has then been transcribed using the LIMSI speech recognizer resulting in automatic transcripts, exhibiting an average word error rate of 12%. 80% of the text corpus (with word chunks of at least five words) can be exactly aligned with the automatic transcripts of the audio data. The residual word error rate on these 80% is less than 1%.
2000
pdf
A French Phonetic Lexicon with Variants for Speech and Language Processing
Philippe Boula de Mareüil
|
Christophe d’Alessandro
|
François Yvon
|
Véronique Aubergé
|
Jacqueline Vaissière
|
Angélique Amelot
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)