Benjamin Lecouteux

2024

pdf bib abs
Approches cascade et de bout-en-bout pour la traduction automatique de la parole en pictogrammes
Cécile Macaire | Chloé Dion | Didier Schwab | Benjamin Lecouteux | Emmanuelle Esperança-Rodier
Actes de la 31ème Conférence sur le Traitement Automatique des Langues Naturelles, volume 1 : articles longs et prises de position

La traduction automatique de la parole en pictogrammes (Parole-à-Pictos) est une nouvelle tâche du Traitement Automatique des Langues (TAL) ayant pour but de proposer une séquence de pictogrammes à partir d’un énoncé oral. Cet article explore deux approches distinctes : (1) en cascade, qui combine un système de reconnaissance vocale avec un système de traduction, et (2) de bout-en-bout, qui adapte un système de traduction automatique de la parole. Nous comparons différentes architectures état de l’art entraînées sur nos propres données alignées parole-pictogrammes. Nous présentons une première évaluation automatique des systèmes et réalisons une évaluation humaine pour analyser leur comportement et leur impact sur la traduction en pictogrammes. Les résultats obtenus mettent en évidence la capacité d’une approche en cascade à générer des traductions acceptables à partir de la parole lue et dans des contextes de la vie quotidienne.

pdf abs
Une approche par graphe pour l’analyse syntaxique en dépendances de bout en bout de la parole
Adrien Pupier | Maximin Coavoux | Benjamin Lecouteux | Jérôme Goulian
Actes de la 31ème Conférence sur le Traitement Automatique des Langues Naturelles, volume 1 : articles longs et prises de position

Effectuer l’analyse syntaxique du signal audio –plutôt que de passer par des transcriptions de l’audio– est une tache récemment proposée par Pupier et al. (2022), dans le but d’incorporer de l’information prosodique dans le modèle d’analyse syntaxique et de passer outre les limitations d’une approche cascade qui consisterait à utiliser un système de reconnaissance de la parole (RAP) puis un analyseur syntaxique. Dans cet article, nous effectuons un ensemble d’expériences visant à comparer les performances de deux familles d’analyseurs syntaxiques (i) l’approche par graphe (ii) la réduction à une tâche d’étiquetage de séquence ; directement sur la parole. Nous évaluons notre approche sur un corpus arboré du Français parlé. Nous montrons que (i) l’approche par graphe obtient de meilleurs résultats globalement (ii) effectuer l’analyse syntaxique directement depuis la parole obtient de meilleurs résultats qu’une approche par cascade de systèmes, malgré 30 de paramètre en moins

pdf abs
Technologies de la parole et données de terrain : le cas du créole haïtien
William N. Havard | Renauld Govain | Daphne Gonçalves Teixeira | Benjamin Lecouteux | Emmanuel Schang
Actes de la 31ème Conférence sur le Traitement Automatique des Langues Naturelles, volume 1 : articles longs et prises de position

Nous utilisons des données de terrain en créole haïtien, récoltées il y a $40$ ans sur cassettes puis numérisées, pour entraîner un modèle natif d’apprentissage auto-supervisé (SSL) de la parole (Wav2Vec2) en haïtien. Nous utilisons une approche de pré-entraînement continu (CPT) sur des modèles SSL pré-entraînés de deux langues étrangères : la langue lexificatrice – le français – et une langue non apparentée – l’anglais. Nous comparons les performances de ces trois modèles SSL, et de deux autres modèles SSL étrangers directement affinés, sur une tâche de reconnaissance de la parole. Nos résultats montrent que le modèle le plus performant est celui qui a été entraîné en utilisant une approche CPT sur la langue lexificatrice, suivi par le modèle natif. Nous concluons que l’approche de ”mobilisation des archives” préconisée par (Bird, 2020) est une voie prometteuse pour concevoir des technologies vocales pour de nouvelles langues.

Les modèles de langue préentraînés (PLM) constituent aujourd’hui de facto l’épine dorsale de la plupart des systèmes de traitement automatique des langues. Dans cet article, nous présentons Jargon, une famille de PLMs pour des domaines spécialisés du français, en nous focalisant sur trois domaines : la parole transcrite, le domaine clinique / biomédical, et le domaine juridique. Nous utilisons une architecture de transformeur basée sur des méthodes computationnellement efficaces(LinFormer) puisque ces domaines impliquent souvent le traitement de longs documents. Nous évaluons et comparons nos modèles à des modèles de l’état de l’art sur un ensemble varié de tâches et de corpus d’évaluation, dont certains sont introduits dans notre article. Nous rassemblons les jeux de données dans un nouveau référentiel d’évaluation en langue française pour ces trois domaines. Nous comparons également diverses configurations d’entraînement : préentraînement prolongé en apprentissage autosupervisé sur les données spécialisées, préentraînement à partir de zéro, ainsi que préentraînement mono et multi-domaines. Nos expérimentations approfondies dans des domaines spécialisés montrent qu’il est possible d’atteindre des performances compétitives en aval, même lors d’un préentraînement avec le mécanisme d’attention approximatif de LinFormer. Pour une reproductibilité totale, nous publions les modèles et les données de préentraînement, ainsi que les corpus utilisés.

pdf abs
Un corpus multimodal alignant parole, transcription et séquences de pictogrammes dédié à la traduction automatique de la parole vers des pictogrammes
Cécile Macaire | Chloé Dion | Jordan Arrigo | Claire Lemaire | Emmanuelle Esperança-Rodier | Benjamin Lecouteux | Didier Schwab
Actes de la 31ème Conférence sur le Traitement Automatique des Langues Naturelles, volume 2 : traductions d'articles publiès

La traduction automatique de la parole vers des pictogrammes peut faciliter la communication entre des soignants et des personnes souffrant de troubles du langage. Cependant, il n’existe pas de formalisme de traduction établi, ni d’ensembles de données accessibles au public pour l’entraînement de systèmes de traduction de la parole vers des pictogrammes. Cet article présente le premier ensemble de données alignant de la parole, du texte et des pictogrammes. Ce corpus comprend plus de 230 heures de parole. Nous discutons de nos choix pour créer une grammaire adaptée à des séquences de pictogrammes. Cette dernière s’articule autour de règles et d’un vocabulaire restreint. La grammaire résulte d’une étude linguistique approfondie des ressources extraites du site Web d’ARASAAC. Nous avons ensuite validé ces règles à l’issue de multiples phases de post-édition par des annotateurs experts. Le corpus proposé est ensuite utilisé pour entraîner un système en cascade traduisant la parole vers des pictogrammes. L’ensemble du corpus est disponible gratuitement sur le site web d’Ortolang sous une licence non commerciale. Il s’agit d’un point de départ pour la recherche portant sur la traduction automatique de la parole vers des pictogrammes.

pdf abs
A Multimodal French Corpus of Aligned Speech, Text, and Pictogram Sequences for Speech-to-Pictogram Machine Translation
Cécile Macaire | Chloé Dion | Jordan Arrigo | Claire Lemaire | Emmanuelle Esperança-Rodier | Benjamin Lecouteux | Didier Schwab
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The automatic translation of spoken language into pictogram units can facilitate communication involving individuals with language impairments. However, there is no established translation formalism or publicly available datasets for training end-to-end speech translation systems. This paper introduces the first aligned speech, text, and pictogram translation dataset ever created in any language. We provide a French dataset that contains 230 hours of speech resources. We create a rule-based pictogram grammar with a restricted vocabulary and include a discussion of the strategic decisions involved. It takes advantage of an in-depth linguistic study of resources taken from the ARASAAC website. We validate these rules through multiple post-editing phases by expert annotators. The constructed dataset is then used to experiment with a Speech-to-Pictogram cascade model, which employs state-of-the-art Automatic Speech Recognition models. The dataset is freely available under a non-commercial licence. This marks a starting point to conduct research into the automatic translation of speech into pictogram units.

Pretrained Language Models (PLMs) are the de facto backbone of most state-of-the-art NLP systems. In this paper, we introduce a family of domain-specific pretrained PLMs for French, focusing on three important domains: transcribed speech, medicine, and law. We use a transformer architecture based on efficient methods (LinFormer) to maximise their utility, since these domains often involve processing long documents. We evaluate and compare our models to state-of-the-art models on a diverse set of tasks and datasets, some of which are introduced in this paper. We gather the datasets into a new French-language evaluation benchmark for these three domains. We also compare various training configurations: continued pretraining, pretraining from scratch, as well as single- and multi-domain pretraining. Extensive domain-specific experiments show that it is possible to attain competitive downstream performance even when pre-training with the approximative LinFormer attention mechanism. For full reproducibility, we release the models and pretraining data, as well as contributed datasets.

pdf abs
What Has LeBenchmark Learnt about French Syntax?
Zdravko Dugonjić | Adrien Pupier | Benjamin Lecouteux | Maximin Coavoux
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The paper reports on a series of experiments aiming at probing LeBenchmark, a pretrained acoustic model trained on 7k hours of spoken French, for syntactic information. Pretrained acoustic models are increasingly used for downstream speech tasks such as automatic speech recognition, speech translation, spoken language understanding or speech parsing. They are trained on very low level information (the raw speech signal), and do not have explicit lexical knowledge. Despite that, they obtained reasonable results on tasks that requires higher level linguistic knowledge. As a result, an emerging question is whether these models encode syntactic information. We probe each representation layer of LeBenchmark for syntax, using the Orféo treebank, and observe that it has learnt some syntactic information. Our results show that syntactic information is more easily extractable from the middle layers of the network, after which a very sharp decrease is observed.

pdf abs
Simplification Strategies in French Spontaneous Speech
Lucía Ormaechea | Nikos Tsourakis | Didier Schwab | Pierrette Bouillon | Benjamin Lecouteux
Proceedings of the Workshop on DeTermIt! Evaluating Text Difficulty in a Multilingual Context @ LREC-COLING 2024

Automatic Text Simplification (ATS) aims at rewriting texts into simpler variants while preserving their original meaning, so they can be more easily understood by different audiences. While ATS has been widely used for written texts, its application to spoken language remains unexplored, even if it is not exempt from difficulty. This study aims to characterize the edit operations performed in order to simplify French transcripts for non-native speakers. To do so, we relied on a data sample randomly extracted from the Orféo-CEFC French spontaneous speech dataset. In the absence of guidelines to direct this process, we adopted an intuitive simplification approach, so as to investigate the crafted simplifications based on expert linguists’ criteria, and to compare them with those produced by a generative AI (namely, ChatGPT). The results, analyzed quantitatively and qualitatively, reveal that the most common edits are deletions, and affect oral production aspects, like restarts or hesitations. Consequently, candidate simplifications are typically register-standardized sentences that solely include the propositional content of the input. The study also examines the alignment between human- and machine-based simplifications, revealing a moderate level of agreement, and highlighting the subjective nature of the task. The findings contribute to understanding the intricacies of simplifying spontaneous spoken language. In addition, the provision of a small-scale parallel dataset derived from such expert simplifications, Propicto-Orféo-Simple, can facilitate the evaluation of speech simplification solutions.

pdf abs
Growing Trees on Sounds: Assessing Strategies for End-to-End Dependency Parsing of Speech
Adrien Pupier | Maximin Coavoux | Jérôme Goulian | Benjamin Lecouteux
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Direct dependency parsing of the speech signal –as opposed to parsing speech transcriptions– has recently been proposed as a task (Pupier et al. 2022), as a way of incorporating prosodic information in the parsing system and bypassing the limitations of a pipeline approach that would consist of using first an Automatic Speech Recognition (ASR) system and then a syntactic parser. In this article, we report on a set of experiments aiming at assessing the performance of two parsing paradigms (graph-based parsing and sequence labeling based parsing) on speech parsing. We perform this evaluation on a large treebank of spoken French, featuring realistic spontaneous conversations. Our findings show that (i) the graph based approach obtain better results across the board (ii) parsing directly from speech outperforms a pipeline approach, despite having 30% fewer parameters.

2023

pdf
Simple, Simpler and Beyond: A Fine-Tuning BERT-Based Approach to Enhance Sentence Complexity Assessment for Text Simplification
Lucía Ormaechea | Nikos Tsourakis | Didier Schwab | Pierrette Bouillon | Benjamin Lecouteux
Proceedings of the 6th International Conference on Natural Language and Speech Processing (ICNLSP 2023)

pdf abs
Plateformes pour la création de données en pictogrammes
Cécile Macaire | Jordan Arrigo | Chloé Dion | Claire Lemaire | Emmanuelle Esperança-Rodier | Benjamin Lecouteux | Didier Schwab
Actes de CORIA-TALN 2023. Actes de la 30e Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 5 : démonstrations

Nous présentons un ensemble de trois interfaces web pour la création de données en pictogrammes dans le cadre du projet ANR Propicto. Chacune a un objectif précis : annoter des données textuelles en pictogrammes ARASAAC, créer un vocabulaire en pictogrammes, et post-éditer des phrases annotées en pictogrammes. Bien que nécessaire pour des outils de traduction automatique vers les unités pictographiques, actuellement, presque aucune ressource annotée n’existe. Cet article présente les spécificités de ces plateformes web (disponibles en ligne gratuitement) et leur utilité.

pdf abs
Voice2Picto : un système de traduction automatique de la parole vers des pictogrammes
Cécile Macaire | Emmanuelle Esperança-Rodier | Benjamin Lecouteux | Didier Schwab
Actes de CORIA-TALN 2023. Actes de la 30e Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 5 : démonstrations

Nous présentons Voice2Picto, un système de traduction permettant, à partir de l’oral, de proposer une séquence de pictogrammes correspondants. S’appuyant sur des technologies du traitement automatique du langage naturel, l’outil a deux objectifs : améliorer l’accès à la communication pour (1) les personnes allophones dans un contexte d’urgence médicale, et (2) pour les personnes avec des difficultés de parole. Il permettra aux personnes des services hospitaliers, et aux familles de véhiculer un message en pictogrammes facilement compréhensible auprès de personnes ne pouvant communiquer via les canaux traditionnels de communication (parole, gestes, langue des signes). Dans cet article, nous décrivons l’architecture du système de Voice2Picto et les pistes futures. L’application est en open-source via un dépôt Git : https://github.com/macairececile/Voice2Picto.

pdf bib abs
Application of Speech Processes for the Documentation of Kréyòl Gwadloupéyen
Éric Le Ferrand | Fabiola Henri | Benjamin Lecouteux | Emmanuel Schang
Proceedings of the Second Workshop on NLP Applications to Field Linguistics

In recent times, there has been a growing number of research studies focused on addressing the challenges posed by low-resource languages and the transcription bottleneck phenomenon. This phenomenon has driven the development of speech recognition methods to transcribe regional and Indigenous languages automatically. Although there is much talk about bridging the gap between speech technologies and field linguistics, there is a lack of documented efficient communication between NLP experts and documentary linguists. The models created for low-resource languages often remain within the confines of computer science departments, while documentary linguistics remain attached to traditional transcription workflows. This paper presents the early stage of a collaboration between NLP experts and field linguists, resulting in the successful transcription of Kréyòl Gwadloupéyen using speech recognition technology.

PROPICTO is a project funded by the French National Research Agency and the Swiss National Science Foundation, that aims at creating Speech-to-Pictograph translation systems, with a special focus on French as an input language. By developing such technologies, we intend to enhance communication access for non-French speaking patients and people with cognitive impairments.

2022

pdf abs
Automatic Speech Recognition and Query By Example for Creole Languages Documentation
Cécile Macaire | Didier Schwab | Benjamin Lecouteux | Emmanuel Schang
Findings of the Association for Computational Linguistics: ACL 2022

We investigate the exploitation of self-supervised models for two Creole languages with few resources: Gwadloupéyen and Morisien. Automatic language processing tools are almost non-existent for these two languages. We propose to use about one hour of annotated data to design an automatic speech recognition system for each language. We evaluate how much data is needed to obtain a query-by-example system that is usable by linguists. Moreover, our experiments show that multilingual self-supervised models are not necessarily the most efficient for Creole languages.

2021

This paper describes the ON-TRAC Consortium translation systems developed for two challenge tracks featured in the Evaluation Campaign of IWSLT 2021, low-resource speech translation and multilingual speech translation. The ON-TRAC Consortium is composed of researchers from three French academic laboratories and an industrial partner: LIA (Avignon Université), LIG (Université Grenoble Alpes), LIUM (Le Mans Université), and researchers from Airbus. A pipeline approach was explored for the low-resource speech translation task, using a hybrid HMM/TDNN automatic speech recognition system fed by wav2vec features, coupled to an NMT system. For the multilingual speech translation task, we investigated the us of a dual-decoder Transformer that jointly transcribes and translates an input speech. This model was trained in order to translate from multiple source languages to multiple target ones.

2020

pdf abs
Reconnaissance de parole beatboxée à l’aide d’un système HMM-GMM inspiré de la reconnaissance automatique de la parole (BEATBOX SOUNDS RECOGNITION USING A SPEECH-DEDICATED HMM-GMM BASED SYSTEM 1 Human beatboxing is a vocal art making use of speech organs to produce percussive sounds and imitate musical instruments)
Solène Evain | Adrien Contesse | Antoine Pinchaud | Didier Schwab | Benjamin Lecouteux | Nathalie Henrich Bernardoni
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 1 : Journées d'Études sur la Parole

Le human-beatbox est un art vocal utilisant les organes de la parole pour produire des sons percussifs et imiter les instruments de musique. La classification des sons du beatbox représente actuellement un défi. Nous proposons un système de reconnaissance des sons de beatbox s’inspirant de la reconnaissance automatique de la parole. Nous nous appuyons sur la boîte à outils Kaldi, qui est très utilisée dans le cadre de la reconnaissance automatique de la parole (RAP). Notre corpus est composé de sons isolés produits par deux beatboxers et se compose de 80 sons différents. Nous nous sommes concentrés sur le décodage avec des modèles acoustiques monophones, à base de HMM-GMM. La transcription utilisée s’appuie sur un système d’écriture spécifique aux beatboxers, appelé Vocal Grammatics (VG). Ce système d’écriture s’appuie sur les concepts de la phonétique articulatoire.

pdf abs
FlauBERT : des modèles de langue contextualisés pré-entraînés pour le français (FlauBERT : Unsupervised Language Model Pre-training for French)
Hang Le | Loïc Vial | Jibril Frej | Vincent Segonne | Maximin Coavoux | Benjamin Lecouteux | Alexandre Allauzen | Benoît Crabbé | Laurent Besacier | Didier Schwab
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 2 : Traitement Automatique des Langues Naturelles

Les modèles de langue pré-entraînés sont désormais indispensables pour obtenir des résultats à l’état-de-l’art dans de nombreuses tâches du TALN. Tirant avantage de l’énorme quantité de textes bruts disponibles, ils permettent d’extraire des représentations continues des mots, contextualisées au niveau de la phrase. L’efficacité de ces représentations pour résoudre plusieurs tâches de TALN a été démontrée récemment pour l’anglais. Dans cet article, nous présentons et partageons FlauBERT, un ensemble de modèles appris sur un corpus français hétérogène et de taille importante. Des modèles de complexité différente sont entraînés à l’aide du nouveau supercalculateur Jean Zay du CNRS. Nous évaluons nos modèles de langue sur diverses tâches en français (classification de textes, paraphrase, inférence en langage naturel, analyse syntaxique, désambiguïsation automatique) et montrons qu’ils surpassent souvent les autres approches sur le référentiel d’évaluation FLUE également présenté ici.

pdf abs
Providing Semantic Knowledge to a Set of Pictograms for People with Disabilities: a Set of Links between WordNet and Arasaac: Arasaac-WN
Didier Schwab | Pauline Trial | Céline Vaschalde | Loïc Vial | Emmanuelle Esperanca-Rodier | Benjamin Lecouteux
Proceedings of the Twelfth Language Resources and Evaluation Conference

This article presents a resource that links WordNet, the widely known lexical and semantic database, and Arasaac, the largest freely available database of pictograms. Pictograms are a tool that is more and more used by people with cognitive or communication disabilities. However, they are mainly used manually via workbooks, whereas caregivers and families would like to use more automated tools (use speech to generate pictograms, for example). In order to make it possible to use pictograms automatically in NLP applications, we propose a database that links them to semantic knowledge. This resource is particularly interesting for the creation of applications that help people with cognitive disabilities, such as text-to-picto, speech-to-picto, picto-to-speech... In this article, we explain the needs for this database and the problems that have been identified. Currently, this resource combines approximately 800 pictograms with their corresponding WordNet synsets and it is accessible both through a digital collection and via an SQL database. Finally, we propose a method with associated tools to make our resource language-independent: this method was applied to create a first text-to-picto prototype for the French language. Our resource is distributed freely under a Creative Commons license at the following URL: https://github.com/getalp/Arasaac-WN.

Language models have become a key step to achieve state-of-the art results in many different Natural Language Processing (NLP) tasks. Leveraging the huge amount of unlabeled texts nowadays available, they provide an efficient way to pre-train continuous word representations that can be fine-tuned for a downstream task, along with their contextualization at the sentence level. This has been widely demonstrated for English using contextualized representations (Dai and Le, 2015; Peters et al., 2018; Howard and Ruder, 2018; Radford et al., 2018; Devlin et al., 2019; Yang et al., 2019b). In this paper, we introduce and share FlauBERT, a model learned on a very large and heterogeneous French corpus. Models of different sizes are trained using the new CNRS (French National Centre for Scientific Research) Jean Zay supercomputer. We apply our French language models to diverse NLP tasks (text classification, paraphrasing, natural language inference, parsing, word sense disambiguation) and show that most of the time they outperform other pre-training approaches. Different versions of FlauBERT as well as a unified evaluation protocol for the downstream tasks, called FLUE (French Language Understanding Evaluation), are shared to the research community for further reproducible experiments in French NLP.

This paper describes the ON-TRAC Consortium translation systems developed for two challenge tracks featured in the Evaluation Campaign of IWSLT 2020, offline speech translation and simultaneous speech translation. ON-TRAC Consortium is composed of researchers from three French academic laboratories: LIA (Avignon Université), LIG (Université Grenoble Alpes), and LIUM (Le Mans Université). Attention-based encoder-decoder models, trained end-to-end, were used for our submissions to the offline speech translation track. Our contributions focused on data augmentation and ensembling of multiple models. In the simultaneous speech translation track, we build on Transformer-based wait-k models for the text-to-text subtask. For speech-to-text simultaneous translation, we attach a wait-k MT system to a hybrid ASR system. We propose an algorithm to control the latency of the ASR+MT cascade and achieve a good latency-quality trade-off on both subtasks.

2019

pdf abs
Compression de vocabulaire de sens grâce aux relations sémantiques pour la désambiguïsation lexicale (Sense Vocabulary Compression through Semantic Knowledge for Word Sense Disambiguation)
Loïc Vial | Benjamin Lecouteux | Didier Schwab
Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Volume I : Articles longs

En Désambiguïsation Lexicale (DL), les systèmes supervisés dominent largement les campagnes d’évaluation. La performance et la couverture de ces systèmes sont cependant rapidement limités par la faible quantité de corpus annotés en sens disponibles. Dans cet article, nous présentons deux nouvelles méthodes qui visent à résoudre ce problème en exploitant les relations sémantiques entre les sens tels que la synonymie, l’hyperonymie et l’hyponymie, afin de compresser le vocabulaire de sens de WordNet, et ainsi réduire le nombre d’étiquettes différentes nécessaires pour pouvoir désambiguïser tous les mots de la base lexicale. Nos méthodes permettent de réduire considérablement la taille des modèles de DL neuronaux, avec l’avantage d’améliorer leur couverture sans données supplémentaires, et sans impacter leur précision. En plus de nos méthodes, nous présentons un système de DL qui tire parti des récents travaux sur les représentations vectorielles de mots contextualisées, afin d’obtenir des résultats qui surpassent largement l’état de l’art sur toutes les tâches d’évaluation de la DL.

pdf bib abs
Apporter des connaissances sémantiques à un jeu de pictogrammes destiné à des personnes en situation de handicap : Un ensemble de liens entre Princeton WordNet et Arasaac, Arasaac-WN (Giving semantic knowledge to a set of pictograms for people with disabilities : a set of links between WordNet and Arasaac, Arasaac-WN )
Didier Schwab | Pauline Trial | Vaschalde Céline | Loïc Vial | Benjamin Lecouteux
Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Volume IV : Démonstrations

Cet article présente une ressource qui fait le lien entre WordNet et Arasaac, la plus grande base de pictogrammes librement disponible. Cette ressource est particulièrement intéressante pour la création d’applications visant l’aide aux personnes en situation de handicap cognitif.

pdf abs
The LIG system for the English-Czech Text Translation Task of IWSLT 2019
Loïc Vial | Benjamin Lecouteux | Didier Schwab | Hang Le | Laurent Besacier
Proceedings of the 16th International Conference on Spoken Language Translation

In this paper, we present our submission for the English to Czech Text Translation Task of IWSLT 2019. Our system aims to study how pre-trained language models, used as input embeddings, can improve a specialized machine translation system trained on few data. Therefore, we implemented a Transformer-based encoder-decoder neural system which is able to use the output of a pre-trained language model as input embeddings, and we compared its performance under three configurations: 1) without any pre-trained language model (constrained), 2) using a language model trained on the monolingual parts of the allowed English-Czech data (constrained), and 3) using a language model trained on a large quantity of external monolingual data (unconstrained). We used BERT as external pre-trained language model (configuration 3), and BERT architecture for training our own language model (configuration 2). Regarding the training data, we trained our MT system on a small quantity of parallel text: one set only consists of the provided MuST-C corpus, and the other set consists of the MuST-C corpus and the News Commentary corpus from WMT. We observed that using the external pre-trained BERT improves the scores of our system by +0.8 to +1.5 of BLEU on our development set, and +0.97 to +1.94 of BLEU on the test set. However, using our own language model trained only on the allowed parallel data seems to improve the machine translation performances only when the system is trained on the smallest dataset.

pdf abs
Sense Vocabulary Compression through the Semantic Knowledge of WordNet for Neural Word Sense Disambiguation
Loïc Vial | Benjamin Lecouteux | Didier Schwab
Proceedings of the 10th Global Wordnet Conference

In this article, we tackle the issue of the limited quantity of manually sense annotated corpora for the task of word sense disambiguation, by exploiting the semantic relationships between senses such as synonymy, hypernymy and hyponymy, in order to compress the sense vocabulary of Princeton WordNet, and thus reduce the number of different sense tags that must be observed to disambiguate all words of the lexical database. We propose two different methods that greatly reduce the size of neural WSD models, with the benefit of improving their coverage without additional training data, and without impacting their precision. In addition to our methods, we present a WSD system which relies on pre-trained BERT word vectors in order to achieve results that significantly outperforms the state of the art on all WSD evaluation tasks.

2018

pdf bib abs
Analyzing Learned Representations of a Deep ASR Performance Prediction Model
Zied Elloumi | Laurent Besacier | Olivier Galibert | Benjamin Lecouteux
Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

This paper addresses a relatively new task: prediction of ASR performance on unseen broadcast programs. In a previous paper, we presented an ASR performance prediction system using CNNs that encode both text (ASR transcript) and speech, in order to predict word error rate. This work is dedicated to the analysis of speech signal embeddings and text embeddings learnt by the CNN while training our prediction model. We try to better understand which information is captured by the deep model and its relation with different conditioning factors. It is shown that hidden layers convey a clear signal about speech style, accent and broadcast type. We then try to leverage these 3 types of information at training time through multi-task learning. Our experiments show that this allows to train slightly more efficient ASR performance prediction systems that - in addition - simultaneously tag the analyzed utterances according to their speech style, accent and broadcast program origin.

pdf
Approche supervisée à base de cellules LSTM bidirectionnelles pour la désambiguïsation lexicale [Supervised Approach Based on Bidirectional LSTM Cells for Word Sense Disambiguation]
Loïc Vial | Benjamin Lecouteux | Didier Schwab
Traitement Automatique des Langues, Volume 59, Numéro 1 : Varia [Varia]

pdf
Prédiction de performance des systèmes de reconnaissance automatique de la parole à l’aide de réseaux de neurones convolutifs [Performance prediction of automatic speech recognition systems using convolutional neural networks]
Zied Elloumi | Benjamin Lecouteux | Olivier Galibert | Laurent Besacier
Traitement Automatique des Langues, Volume 59, Numéro 2 : Apprentissage profond pour le traitement automatique des langues [Deep Learning for natural language processing]

pdf
UFSAC: Unification of Sense Annotated Corpora and Tools
Loïc Vial | Benjamin Lecouteux | Didier Schwab
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf abs
Approche supervisée à base de cellules LSTM bidirectionnelles pour la désambiguïsation lexicale (LSTM Based Supervised Approach for Word Sense Disambiguation)
Loïc Vial | Benjamin Lecouteux | Didier Schwab
Actes de la Conférence TALN. Volume 1 - Articles longs, articles courts de TALN

En désambiguïsation lexicale, l’utilisation des réseaux de neurones est encore peu présente et très récente. Cette direction est pourtant très prometteuse, tant les résultats obtenus par ces premiers systèmes arrivent systématiquement en tête des campagnes d’évaluation, malgré une marge d’amélioration qui semble encore importante. Nous présentons dans cet article une nouvelle architecture à base de réseaux de neurones pour la désambiguïsation lexicale. Notre système est à la fois moins complexe à entraîner que les systèmes neuronaux existants et il obtient des résultats état de l’art sur la plupart des tâches d’évaluation de la désambiguïsation lexicale en anglais. L’accent est porté sur la reproductibilité de notre système et de nos résultats, par l’utilisation d’un modèle de vecteurs de mots, de corpus d’apprentissage et d’évaluation librement accessibles.

2017

pdf
Disentangling ASR and MT Errors in Speech Translation
Ngoc-Tien Le | Benjamin Lecouteux | Laurent Besacier
Proceedings of Machine Translation Summit XVI: Research Track

pdf abs
Traitement des Mots Hors Vocabulaire pour la Traduction Automatique de Document OCRisés en Arabe (This article presents a new system that automatically translates images of Arabic documents)
Kamel Bouzidi | Zied Elloumi | Laurent Besacier | Benjamin Lecouteux | Mohamed-Faouzi Benzeghiba
Actes des 24ème Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 - Articles longs

Cet article présente un système original de traduction de documents numérisés en arabe. Deux modules sont cascadés : un système de reconnaissance optique de caractères (OCR) en arabe et un système de traduction automatique (TA) arabe-français. Le couplage OCR-TA a été peu abordé dans la littérature et l’originalité de cette étude consiste à proposer un couplage étroit entre OCR et TA ainsi qu’un traitement spécifique des mots hors vocabulaire (MHV) engendrés par les erreurs d’OCRisation. Le couplage OCR-TA par treillis et notre traitement des MHV par remplacement selon une mesure composite qui prend en compte forme de surface et contexte du mot, permettent une amélioration significative des performances de traduction. Les expérimentations sont réalisés sur un corpus de journaux numérisés en arabe et permettent d’obtenir des améliorations en score BLEU de 3,73 et 5,5 sur les corpus de développement et de test respectivement.

pdf abs
Représentation vectorielle de sens pour la désambiguïsation lexicale à base de connaissances (Sense Embeddings in Knowledge-Based Word Sense Disambiguation)
Loïc Vial | Benjamin Lecouteux | Didier Schwab
Actes des 24ème Conférence sur le Traitement Automatique des Langues Naturelles. Volume 2 - Articles courts

Dans cet article, nous proposons une nouvelle méthode pour représenter sous forme vectorielle les sens d’un dictionnaire. Nous utilisons les termes employés dans leur définition en les projetant dans un espace vectoriel, puis en additionnant les vecteurs résultants, avec des pondérations dépendantes de leur partie du discours et de leur fréquence. Le vecteur de sens résultant est alors utilisé pour trouver des sens reliés, permettant de créer un réseau lexical de manière automatique. Le réseau obtenu est ensuite évalué par rapport au réseau lexical de WordNet, construit manuellement. Pour cela nous comparons l’impact des différents réseaux sur un système de désambiguïsation lexicale basé sur la mesure de Lesk. L’avantage de notre méthode est qu’elle peut être appliquée à n’importe quelle langue ne possédant pas un réseau lexical comme celui de WordNet. Les résultats montrent que notre réseau automatiquement généré permet d’améliorer le score du système de base, atteignant quasiment la qualité du réseau de WordNet.

pdf abs
Uniformisation de corpus anglais annotés en sens (Unification of sense annotated English corpora for word sense disambiguation)
Loïc Vial | Benjamin Lecouteux | Didier Schwab
Actes des 24ème Conférence sur le Traitement Automatique des Langues Naturelles. Volume 3 - Démonstrations

Pour la désambiguïsation lexicale en anglais, on compte aujourd’hui une quinzaine de corpus annotés en sens dans des formats souvent différents et provenant de différentes versions du Princeton WordNet. Nous présentons un format pour uniformiser ces corpus, et nous fournissons à la communauté l’ensemble des corpus annotés en anglais portés à notre connaissance avec des sens uniformisés du Princeton WordNet 3.0, lorsque les droits le permettent et le code source pour construire l’ensemble des corpus à partir des données originales.

pdf
Sense Embeddings in Knowledge-Based Word Sense Disambiguation
Loïc Vial | Benjamin Lecouteux | Didier Schwab
Proceedings of the 12th International Conference on Computational Semantics (IWCS) — Short papers

2016

pdf abs
Acquisition et reconnaissance automatique d’expressions et d’appels vocaux dans un habitat. (Acquisition and recognition of expressions and vocal calls in a smart home)
Michel Vacher | Benjamin Lecouteux | Frédéric Aman | François Portet | Solange Rossato
Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 1 : JEP

Cet article présente un système capable de reconnaître les appels à l’aide de personnes âgées vivant à domicile afin de leur fournir une assistance. Le système utilise une technologie de Reconnaissance Automatique de la Parole (RAP) qui doit fonctionner en conditions de parole distante et avec de la parole expressive. Pour garantir l’intimité, le système s’exécute localement et ne reconnaît que des phrases prédéfinies. Le système a été évalué par 17 participants jouant des scénarios incluant des chutes dans un Living lab reproduisant un salon. Le taux d’erreur de détection obtenu, 29%, est encourageant et souligne les défis à surmonter pour cette tâche.

pdf abs
Joint ASR and MT Features for Quality Estimation in Spoken Language Translation
Ngoc-Tien Le | Benjamin Lecouteux | Laurent Besacier
Proceedings of the 13th International Conference on Spoken Language Translation

This paper aims to unravel the automatic quality assessment for spoken language translation (SLT). More precisely, we propose several effective estimators based on our estimation of transcription (ASR) quality, translation (MT) quality, or both (combined and joint features using ASR and MT information). Our experiments provide an important opportunity to advance the understanding of the prediction quality of words in a SLT output that were revealed by MT and ASR features. These results could be applied to interactive speech translation or computer-assisted translation of speeches and lectures. For reproducible experiments, the code allowing to call our WCE-LIG application and the corpora used are made available to the research community.

Ambient Assisted Living aims at enhancing the quality of life of older and disabled people at home thanks to Smart Homes. In particular, regarding elderly living alone at home, the detection of distress situation after a fall is very important to reassure this kind of population. However, many studies do not include tests in real settings, because data collection in this domain is very expensive and challenging and because of the few available data sets. The C IRDO corpus is a dataset recorded in realistic conditions in D OMUS , a fully equipped Smart Home with microphones and home automation sensors, in which participants performed scenarios including real falls on a carpet and calls for help. These scenarios were elaborated thanks to a field study involving elderly persons. Experiments related in a first part to distress detection in real-time using audio and speech analysis and in a second part to fall detection using video analysis are presented. Results show the difficulty of the task. The database can be used as standardized database by researchers to evaluate and compare their systems for elderly person’s assistance.

pdf abs
CirdoX: an on/off-line multisource speech and sound analysis software
Frédéric Aman | Michel Vacher | François Portet | William Duclot | Benjamin Lecouteux
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Vocal User Interfaces in domestic environments recently gained interest in the speech processing community. This interest is due to the opportunity of using it in the framework of Ambient Assisted Living both for home automation (vocal command) and for call for help in case of distress situations, i.e. after a fall. C IRDO X, which is a modular software, is able to analyse online the audio environment in a home, to extract the uttered sentences and then to process them thanks to an ASR module. Moreover, this system perfoms non-speech audio event classification; in this case, specific models must be trained. The software is designed to be modular and to process on-line the audio multichannel stream. Some exemples of studies in which C IRDO X was involved are described. They were operated in real environment, namely a Living lab environment.

2015

pdf abs
Utilisation de mesures de confiance pour améliorer le décodage en traduction de parole
Laurent Besacier | Benjamin Lecouteux | Luong Ngoc Quang
Actes de la 22e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Les mesures de confiance au niveau mot (Word Confidence Estimation - WCE) pour la traduction auto- matique (TA) ou pour la reconnaissance automatique de la parole (RAP) attribuent un score de confiance à chaque mot dans une hypothèse de transcription ou de traduction. Dans le passé, l’estimation de ces mesures a le plus souvent été traitée séparément dans des contextes RAP ou TA. Nous proposons ici une estimation conjointe de la confiance associée à un mot dans une hypothèse de traduction automatique de la parole (TAP). Cette estimation fait appel à des paramètres issus aussi bien des systèmes de transcription de la parole (RAP) que des systèmes de traduction automatique (TA). En plus de la construction de ces estimateurs de confiance robustes pour la TAP, nous utilisons les informations de confiance pour re-décoder nos graphes d’hypothèses de traduction. Les expérimentations réalisées montrent que l’utilisation de ces mesures de confiance au cours d’une seconde passe de décodage permettent d’obtenir une amélioration significative des performances de traduction (évaluées avec la métrique BLEU - gains de deux points par rapport à notre système de traduc- tion de parole de référence). Ces expériences sont faites pour une tâche de TAP (français-anglais) pour laquelle un corpus a été spécialement conçu (ce corpus, mis à la disposition de la communauté TALN, est aussi décrit en détail dans l’article).

pdf
An open-source toolkit for word-level confidence estimation in machine translation
Christophe Servan | Ngoc Tien Le | Ngoc Quang Luong | Benjamin Lecouteux | Laurent Besacier
Proceedings of the 12th International Workshop on Spoken Language Translation: Papers

pdf
Recognition of Distress Calls in Distant Speech Setting: a Preliminary Experiment in a Smart Home
Michel Vacher | Benjamin Lecouteux | Frédéric Aman | Solange Rossato | François Portet
Proceedings of SLPAT 2015: 6th Workshop on Speech and Language Processing for Assistive Technologies

2014

pdf
An efficient two-pass decoder for SMT using word confidence estimation
Ngoc-Quang Luong | Laurent Besacier | Benjamin Lecouteux
Proceedings of the 17th Annual Conference of the European Association for Machine Translation

pdf abs
The Sweet-Home speech and multimodal corpus for home automation interaction
Michel Vacher | Benjamin Lecouteux | Pedro Chahuara | François Portet | Brigitte Meillon | Nicolas Bonnefond
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Ambient Assisted Living aims at enhancing the quality of life of older and disabled people at home thanks to Smart Homes and Home Automation. However, many studies do not include tests in real settings, because data collection in this domain is very expensive and challenging and because of the few available data sets. The S WEET-H OME multimodal corpus is a dataset recorded in realistic conditions in D OMUS, a fully equipped Smart Home with microphones and home automation sensors, in which participants performed Activities of Daily living (ADL). This corpus is made of a multimodal subset, a French home automation speech subset recorded in Distant Speech conditions, and two interaction subsets, the first one being recorded by 16 persons without disabilities and the second one by 6 seniors and 5 visually impaired people. This corpus was used in studies related to ADL recognition, context aware interaction and distant speech recognition applied to home automation controled through voice.

pdf bib
Word Confidence Estimation for SMT N-best List Re-ranking
Ngoc-Quang Luong | Laurent Besacier | Benjamin Lecouteux
Proceedings of the EACL 2014 Workshop on Humans and Computer-assisted Translation

pdf
LIG System for Word Level QE task at WMT14
Ngoc-Quang Luong | Laurent Besacier | Benjamin Lecouteux
Proceedings of the Ninth Workshop on Statistical Machine Translation

2013

pdf
LIG System for WMT13 QE Task: Investigating the Usefulness of Features in Word Confidence Estimation for MT
Ngoc-Quang Luong | Benjamin Lecouteux | Laurent Besacier
Proceedings of the Eighth Workshop on Statistical Machine Translation

pdf
Experimental Evaluation of Speech Recognition Technologies for Voice-based Home Automation Control in a Smart Home
Michel Vacher | Benjamin Lecouteux | Dan Istrate | Thierry Joubert | François Portet | Mohamed Sehili | Pedro Chahuara
Proceedings of the Fourth Workshop on Speech and Language Processing for Assistive Technologies

pdf
Driven Decoding for machine translation (Vers un décodage guidé pour la traduction automatique) [in French]
Benjamin Lecouteux | Laurent Besacier
Proceedings of TALN 2013 (Volume 2: Short Papers)

2012

pdf
Reconnaissance d’ordres domotiques en conditions bruitées pour l’assistance à domicile (Recognition of Voice Commands by Multisource ASR and Noise Cancellation in a Smart Home Environment) [in French]
Benjamin Lecouteux | Michel Vacher | François Portet
JEP-TALN-RECITAL 2012, Workshop ILADI 2012: Interactions Langagières pour personnes Agées Dans les habitats Intelligents (ILADI 2012: Language Interaction for Elderly in Smart Homes)

pdf abs
The LIG English to French machine translation system for IWSLT 2012
Laurent Besacier | Benjamin Lecouteux | Marwen Azouzi | Ngoc Quang Luong
Proceedings of the 9th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper presents the LIG participation to the E-F MT task of IWSLT 2012. The primary system proposed made a large improvement (more than 3 point of BLEU on tst2010 set) compared to our last year participation. Part of this improvment was due to the use of an extraction from the Gigaword corpus. We also propose a preliminary adaptation of the driven decoding concept for machine translation. This method allows an efficient combination of machine translation systems, by rescoring the log-linear model at the N-best list level according to auxiliary systems: the basis technique is essentially guiding the search using one or previous system outputs. The results show that the approach allows a significant improvement in BLEU score using Google translate to guide our own SMT system. We also try to use a confidence measure as an additional log-linear feature but we could not get any improvment with this technique.

pdf bib
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 1: JEP
Laurent Besacier | Benjamin Lecouteux | Gilles Sérasset
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 1: JEP

pdf
Reconnaissance automatique de la parole distante dans un habitat intelligent : méthodes multi-sources en conditions réalistes (Distant Speech Recognition in a Smart Home : Comparison of Several Multisource ASRs in Realistic Conditions) [in French]
Benjamin Lecouteux | Michel Vacher | François Portet
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 1: JEP

pdf
Prédiction de l’indexabilité d’une transcription (Prediction of transcription indexability) [in French]
Grégory Senay | Benjamin Lecouteux | Georges Linarès
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 1: JEP

2011

pdf abs
LIG English-French spoken language translation system for IWSLT 2011
Benjamin Lecouteux | Laurent Besacier | Hervé Blanchon
Proceedings of the 8th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper describes the system developed by the LIG laboratory for the 2011 IWSLT evaluation. We participated to the English-French MT and SLT tasks. The development of a reference translation system (MT task), as well as an ASR output translation system (SLT task) are presented. We focus this year on the SLT task and on the use of multiple 1-best ASR outputs to improve overall translation quality. The main experiment presented here compares the performance of a SLT system where multiple ASR 1-best are combined before translation (source combination), with a SLT system where multiple ASR 1-best are translated, the system combination being conducted afterwards on the target side (target combination). The experimental results show that the second approach (target combination) overpasses the first one, when the performance is measured with BLEU.

2010

pdf abs
Transcriber Driving Strategies for Transcription Aid System
Grégory Senay | Georges Linarès | Benjamin Lecouteux | Stanislas Oger | Thierry Michel
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Speech recognition technology suffers from a lack of robustness which limits its usability for fully automated speech-to-text transcription, and manual correction is generally required to obtain perfect transcripts. In this paper, we propose a general scheme for semi-automatic transcription, in which the system and the transcriptionist contribute jointly to the speech transcription. The proposed system relies on the editing of confusion networks and on reactive decoding, the latter one being supposed to take benefits from the manual correction and improve the error rates. In order to reduce the correction time, we evaluate various strategies aiming to guide the transcriptionist towards the critical areas of transcripts. These strategies are based on graph density-based criterion and two semantic consistency criterion; using a corpus-based method and a web-search engine. They allow to indicate to the user the areas which present severe lacks of understandability. We evaluate these driving strategies by simulating the correction process of French broadcast news transcriptions. Results show that interactive decoding improves the correction act efficiency with all driving strategies and semantic information must be integrated into the interactive decoding process.