Thierry Bazillon


Automatically enriching spoken corpora with syntactic information for linguistic studies
Alexis Nasr | Frederic Bechet | Benoit Favre | Thierry Bazillon | Jose Deulofeu | Andre Valli
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Syntactic parsing of speech transcriptions faces the problem of the presence of disfluencies that break the syntactic structure of the utterances. We propose in this paper two solutions to this problem. The first one relies on a disfluencies predictor that detects disfluencies and removes them prior to parsing. The second one integrates the disfluencies in the syntactic structure of the utterances and train a disfluencies aware parser.


Syntactic annotation of spontaneous speech: application to call-center conversation data
Thierry Bazillon | Melanie Deplano | Frederic Bechet | Alexis Nasr | Benoit Favre
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper describes the syntactic annotation process of the DECODA corpus. This corpus contains manual transcriptions of spoken conversations recorded in the French call-center of the Paris Public Transport Authority (RATP). Three levels of syntactic annotation have been performed with a semi-supervised approach: POS tags, Syntactic Chunks and Dependency parses. The main idea is to use off-the-shelf NLP tools and models, originaly developped and trained on written text, to perform a first automatic annotation on the manually transcribed corpus. At the same time a fully manual annotation process is performed on a subset of the original corpus, called the GOLD corpus. An iterative process is then applied, consisting in manually correcting errors found in the automatic annotations, retraining the linguistic models of the NLP tools on this corrected corpus, then checking the quality of the adapted models on the fully manual annotations of the GOLD corpus. This process iterates until a certain error rate is reached. This paper describes this process, the main issues raising when adapting NLP tools to process speech transcriptions, and presents the first evaluations performed with these new adapted tools.

DECODA: a call-centre human-human spoken conversation corpus
Frederic Bechet | Benjamin Maza | Nicolas Bigouroux | Thierry Bazillon | Marc El-Bèze | Renato De Mori | Eric Arbillot
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The goal of the DECODA project is to reduce the development cost of Speech Analytics systems by reducing the need for manual annotat ion. This project aims to propose robust speech data mining tools in the framework of call-center monitoring and evaluation, by means of weakl y supervised methods. The applicative framework of the project is the call-center of the RATP (Paris public transport authority). This project tackles two very important open issues in the development of speech mining methods from spontaneous speech recorded in call-centers : robus tness (how to extract relevant information from very noisy and spontaneous speech messages) and weak supervision (how to reduce the annotation effort needed to train and adapt recognition and classification models). This paper describes the DECODA corpus collected at the RATP during the project. We present the different annotation levels performed on the corpus, the methods used to obtain them, as well as some evaluation o f the quality of the annotations produced.


Qui êtes-vous ? Catégoriser les questions pour déterminer le rôle des locuteurs dans des conversations orales (Who are you? Categorize questions to determine the role of speakers in oral conversations)
Thierry Bazillon | Benjamin Maza | Mickael Rouvier | Frédéric Béchet | Alexis Nasr
Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

La fouille de données orales est un domaine de recherche visant à caractériser un flux audio contenant de la parole d’un ou plusieurs locuteurs, à l’aide de descripteurs liés à la forme et au contenu du signal. Outre la transcription automatique en mots des paroles prononcées, des informations sur le type de flux audio traité ainsi que sur le rôle et l’identité des locuteurs sont également cruciales pour permettre des requêtes complexes telles que : « chercher des débats sur le thème X », « trouver toutes les interviews de Y », etc. Dans ce cadre, et en traitant des conversations enregistrées lors d’émissions de radio ou de télévision, nous étudions la manière dont les locuteurs expriment des questions dans les conversations, en partant de l’intuition initiale que la forme des questions posées est une signature du rôle du locuteur dans la conversation (présentateur, invité, auditeur, etc.). En proposant une classification du type des questions et en utilisant ces informations en complément des descripteurs généralement utilisés dans la littérature pour classer les locuteurs par rôle, nous espérons améliorer l’étape de classification, et valider par la même occasion notre intuition initiale.

Using MMIL for the High Level Semantic Annotation of the French MEDIA Dialogue Corpus
Lina Maria Rojas-Barahona | Thierry Bazillon | Matthieu Quignard | Fabrice Lefevre
Proceedings of the Ninth International Conference on Computational Semantics (IWCS 2011)


The EPAC Corpus: Manual and Automatic Annotations of Conversational Speech in French Broadcast News
Yannick Estève | Thierry Bazillon | Jean-Yves Antoine | Frédéric Béchet | Jérôme Farinas
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper presents the EPAC corpus which is composed by a set of 100 hours of conversational speech manually transcribed and by the outputs of automatic tools (automatic segmentation, transcription, POS tagging, etc.) applied on the entire French ESTER 1 audio corpus: this concerns about 1700 hours of audio recordings from radiophonic shows. This corpus was built during the EPAC project funded by the French Research Agency (ANR) from 2007 to 2010. This corpus increases significantly the amount of French manually transcribed audio recordings easily available and it is now included as a part of the ESTER 1 corpus in the ELRA catalog without additional cost. By providing a large set of automatic outputs of speech processing tools, the EPAC corpus should be useful to researchers who want to work on such data without having to develop and deal with such tools. These automatic annotations are various: segmentation and speaker diarization, one-best hypotheses from the LIUM automatic speech recognition system with confidence measures, but also word-lattices and confusion networks, named entities, part-of-speech tags, chunks, etc. The 100 hours of speech manually transcribed were split into three data sets in order to get an official training corpus, an official development corpus and an official test corpus. These data sets were used to develop and to evaluate some automatic tools which have been used to process the 1700 hours of audio recording. For example, on the EPAC test data set our ASR system yields a word error rate equals to 17.25%.


Manual vs Assisted Transcription of Prepared and Spontaneous Speech
Thierry Bazillon | Yannick Estève | Daniel Luzzati
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Our paper focuses on the gain which can be achieved on human transcription of spontaneous and prepared speech, by using the assistance of an ASR system. This experiment has shown interesting results, first about the duration of the transcription task itself: even with the combination of prepared speech + ASR, an experimented annotator needs approximately 4 hours to transcribe 1 hours of audio data. Then, using an ASR system is mostly time-saving, although this gain is much more significant on prepared speech: assisted transcriptions are up to 4 times faster than manual ones. This ratio falls to 2 with spontaneous speech, because of ASR limits for these data. Detailed results reveal interesting correlations between the transcription task and phenomena such as Word Error Rate, telephonic or non-native speech turns, the number of fillers or propers nouns. The latter make spelling correction very time-consuming with prepared speech because of their frequency. As a consequence, watching for low averages of proper nouns may be a way to detect spontaneous speech.