Denis Jouvet

2022

pdf abs
Adapting Language Models When Training on Privacy-Transformed Data
Tugtekin Turan | Dietrich Klakow | Emmanuel Vincent | Denis Jouvet
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In recent years, voice-controlled personal assistants have revolutionized the interaction with smart devices and mobile applications. The collected data are then used by system providers to train language models (LMs). Each spoken message reveals personal information, hence removing private information from the input sentences is necessary. Our data sanitization process relies on recognizing and replacing named entities by other words from the same class. However, this may harm LM training because privacy-transformed data is unlikely to match the test distribution. This paper aims to fill the gap by focusing on the adaptation of LMs initially trained on privacy-transformed sentences using a small amount of original untransformed data. To do so, we combine class-based LMs, which provide an effective approach to overcome data sparsity in the context of n-gram LMs, and neural LMs, which handle longer contexts and can yield better predictions. Our experiments show that training an LM on privacy-transformed data result in a relative 11% word error rate (WER) increase compared to training on the original untransformed data, and adapting that model on a limited amount of original untransformed data leads to a relative 8% WER improvement over the model trained solely on privacy-transformed data.

2020

pdf abs
Adaptation de domaine non supervisée pour la reconnaissance de la langue par régularisation d’un réseau de neurones (Unsupervised domain adaptation for language identification by regularization of a neural network)
Raphaël Duroselle | Denis Jouvet | Irina Illina
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 1 : Journées d'Études sur la Parole

Les systèmes automatiques d’identification de la langue subissent une dégradation importante de leurs performances quand les caractéristiques acoustiques des signaux de test diffèrent fortement des caractéristiques des données d’entraînement. Dans cet article, nous étudions l’adaptation de domaine non supervisée d’un système entraîné sur des conversations téléphoniques à des transmissions radio. Nous présentons une méthode de régularisation d’un réseau de neurones consistant à ajouter à la fonction de coût un terme mesurant la divergence entre les deux domaines. Des expériences sur le corpus OpenSAD15 nous permettent de sélectionner la Maximum Mean Discrepancy pour réaliser cette mesure. Cette approche est ensuite appliquée à un système moderne d’identification de la langue reposant sur des x-vectors. Sur le corpus RATS, pour sept des huit canaux radio étudiés, l’approche permet, sans utiliser de données annotées du domaine cible, de surpasser la performance d’un système entraîné de façon supervisée avec des données annotées de ce domaine.

pdf abs
Étude comparative de corrélats prosodiques de marqueurs discursifs français et anglais selon leur fonction pragmatique (Comparative study on prosodic correlates of discourse markers in French and English according to their pragmatic function)
Lou Lee | Denis Jouvet | Katarina Bartkova | Yvon Keromnes | Mathilde Dargnat
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 1 : Journées d'Études sur la Parole

Ce papier présente une étude des caractéristiques prosodiques de marqueurs discursifs en fonction de leur sens pragmatique. L’étude est menée sur trois marqueurs discursifs français (alors, bon, donc) et trois marqueurs anglais (now, so, well) afin de comparer leurs caractéristiques prosodiques dans ces deux langues. Plusieurs paramètres prosodiques ont été calculés sur les marqueurs discursifs, et analysés selon les fonctions pragmatiques de ceux-ci. L’analyse a été effectuée sur plusieurs centaines d’occurrences de marqueurs discursifs extraits de corpus oraux français et anglais. Les résultats montrent que certaines fonctions pragmatiques des marqueurs discursifs amènent leurs propres caractéristiques prosodiques au niveau des pauses et des mouvements de la fréquence fondamentale. On observe également que les fonctions pragmatiques similaires partagent fréquemment des caractéristiques prosodiques similaires à travers les deux langues.

pdf abs
Projet AMIS : résumé et traduction automatique de vidéos (AMIS project : automatic summarization and translation of videos)
Mohamed Amine Menacer | Dominique Fohr | Denis Jouvet | Karima Abidi | David Langlois | Kamel Smaïli
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 4 : Démonstrations et résumés d'articles internationaux

La démonstration de résumé et de traduction automatique de vidéos résulte de nos travaux dans le projet AMIS. L’objectif du projet était d’aider un voyageur à comprendre les nouvelles dans un pays étranger. Pour cela, le projet propose de résumer et traduire automatiquement une vidéo en langue étrangère (ici, l’arabe). Un autre objectif du projet était aussi de comparer les opinions et sentiments exprimés dans plusieurs vidéos comparables. La démonstration porte sur l’aspect résumé, transcription et traduction. Les exemples montrés permettront de comprendre et mesurer qualitativement les résultats du projet.

2017

Automatic speech recognition for Arabic is a very challenging task. Despite all the classical techniques for Automatic Speech Recognition (ASR), which can be efficiently applied to Arabic speech recognition, it is essential to take into consideration the language specificities to improve the system performance. In this article, we focus on Modern Standard Arabic (MSA) speech recognition. We introduce the challenges related to Arabic language, namely the complex morphology nature of the language and the absence of the short vowels in written text, which leads to several potential vowelization for each graphemes, which is often conflicting. We develop an ASR system for MSA by using Kaldi toolkit. Several acoustic and language models are trained. We obtain a Word Error Rate (WER) of 14.42 for the baseline system and 12.2 relative improvement by rescoring the lattice and by rewriting the output with the right Z hamoza above or below Alif.

2016

The IFCASL corpus is a French-German bilingual phonetic learner corpus designed, recorded and annotated in a project on individualized feedback in computer-assisted spoken language learning. The motivation for setting up this corpus was that there is no phonetically annotated and segmented corpus for this language pair of comparable of size and coverage. In contrast to most learner corpora, the IFCASL corpus incorporate data for a language pair in both directions, i.e. in our case French learners of German, and German learners of French. In addition, the corpus is complemented by two sub-corpora of native speech by the same speakers. The corpus provides spoken data by about 100 speakers with comparable productions, annotated and segmented on the word and the phone level, with more than 50% manually corrected data. The paper reports on inter-annotator agreement and the optimization of the acoustic models for forced speech-text alignment in exercises for computer-assisted pronunciation training. Example studies based on the corpus data with a phonetic focus include topics such as the realization of /h/ and glottal stop, final devoicing of obstruents, vowel quantity and quality, pitch range, and tempo.

2015

pdf
Qualitative investigation of the display of speech recognition results for communication with deaf people
Agnès Piquard-Kipffer | Odile Mella | Jérémy Miranda | Denis Jouvet | Luiza Orosanu
Proceedings of SLPAT 2015: 6th Workshop on Speech and Language Processing for Assistive Technologies

2014

We present the design of a corpus of native and non-native speech for the language pair French-German, with a special emphasis on phonetic and prosodic aspects. To our knowledge there is no suitable corpus, in terms of size and coverage, currently available for the target language pair. To select the target L1-L2 interference phenomena we prepare a small preliminary corpus (corpus1), which is analyzed for coverage and cross-checked jointly by French and German experts. Based on this analysis, target phenomena on the phonetic and phonological level are selected on the basis of the expected degree of deviation from the native performance and the frequency of occurrence. 14 speakers performed both L2 (either French or German) and L1 material (either German or French). This allowed us to test, recordings duration, recordings material, the performance of our automatic aligner software. Then, we built corpus2 taking into account what we learned about corpus1. The aims are the same but we adapted speech material to avoid too long recording sessions. 100 speakers will be recorded. The corpus (corpus1 and corpus2) will be prepared as a searchable database, available for the scientific community after completion of the project.

2012

pdf
Détection de transcriptions incorrectes de parole non-native dans le cadre de l’apprentissage de langues étrangères (Detection of incorrect transcriptions of non-native speech in the context of foreign language learning) [in French]
Luiza Orosanu | Denis Jouvet | Dominique Fohr | Irina Illina | Anne Bonneau
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 1: JEP

pdf
Génération des prononciations de noms propres à l’aide des Champs Aléatoires Conditionnels (Pronunciation generation for proper names using Conditional Random Fields) [in French]
Irina Illina | Dominique Fohr | Denis Jouvet
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 1: JEP

pdf
Exploitation d’une marge de tolérance de classification pour améliorer l’apprentissage de modèles acoustiques de classes en reconnaissance de la parole (Exploitation of a classification tolerance margin for improving the estimation of class-based acoustic models for speech recognition) [in French]
Denis Jouvet | Arseniy Gorin | Nicolas Vinuesa
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 1: JEP