Salima Mdhaffar
2026
Pantagruel: Unified Self-Supervised Encoders for French Text and Speech
Phuong-Hang Le | Valentin Pelloin | Arnault Chatelain | Maryem Bouziane | Mohammed Ghennai | Qianwen Guan | Kirill Milintsevich | Salima Mdhaffar | Aidan Mannion | Nils Defauw | Shuyue Gu | Alexandre Daniel Audibert | Marco Dinarelli | Yannick Estève | Lorraine Goeuriot | Steffen Lalande | Nicolas Hervé | Maximin Coavoux | François Portet | Étienne Ollion | Marie Candito | Maxime Peyrard | Solange Rossato | Benjamin Lecouteux | Aurélie Nardy | Gilles Sérasset | Vincent Segonne | Solène Evain | Diandra Fabre | Didier Schwab
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Phuong-Hang Le | Valentin Pelloin | Arnault Chatelain | Maryem Bouziane | Mohammed Ghennai | Qianwen Guan | Kirill Milintsevich | Salima Mdhaffar | Aidan Mannion | Nils Defauw | Shuyue Gu | Alexandre Daniel Audibert | Marco Dinarelli | Yannick Estève | Lorraine Goeuriot | Steffen Lalande | Nicolas Hervé | Maximin Coavoux | François Portet | Étienne Ollion | Marie Candito | Maxime Peyrard | Solange Rossato | Benjamin Lecouteux | Aurélie Nardy | Gilles Sérasset | Vincent Segonne | Solène Evain | Diandra Fabre | Didier Schwab
Proceedings of the Fifteenth Language Resources and Evaluation Conference
We release Pantagruel models, a new family of self-supervised encoder models for French text and speech. Instead of predicting modality-tailored targets such as textual tokens or speech units, Pantagruel learns contextualized target representations in the feature space, allowing modality-specific encoders to capture linguistic and acoustic regularities more effectively. Separate models are pre-trained on large-scale French corpora, including Wikipedia, OSCAR and CroissantLLM for text, together with MultilingualLibriSpeech, LeBenchmark, and INA-100k for speech. INA-100k is a newly introduced 100,000-hour corpus of French audio derived from the archives of the Institut National de l’Audiovisuel (INA), the national repository of French radio and television broadcasts, providing highly diverse audio data. We evaluate Pantagruel across a broad range of downstream tasks spanning both modalities, including those from the standard French benchmarks such as FLUE or LeBenchmark. Across these tasks, Pantagruel models show competitive or superior performance compared to strong French baselines such as CamemBERT, FlauBERT, and LeBenchmark2.0, while maintaining a shared architecture that can seamlessly handle either speech or text inputs. These results confirm the effectiveness of feature-space self-supervised objectives for French representation learning and highlight Pantagruel as a robust foundation for multimodal speech-text understanding.
Using Multimodal and Language-Agnostic Sentence Embeddings for Abstractive Summarization
Chaimae Chellaf El Hammoud | Salima Mdhaffar | Yannick Estève | Stéphane Huet
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Chaimae Chellaf El Hammoud | Salima Mdhaffar | Yannick Estève | Stéphane Huet
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Abstractive summarization aims to generate concise summaries by creating new sentences, allowing for flexible rephrasing. However, this approach can be vulnerable to inaccuracies, particularly ‘hallucinations’ where the model introduces non-existent information. In this paper, we leverage the use of multimodal and multilingual sentence embeddings derived from pre-trained models such as LaBSE, SONAR, and BGE-M3, and feed them into a modified BART-based French model. A Named Entity Injection mechanism that appends tokenized named entities to the decoder input is introduced, in order to improve the factual consistency of the generated summary. Our novel framework, SBARThez, is applicable to both text and speech inputs and supports cross-lingual summarization; it shows competitive performance relative to token-level baselines, especially for low-resource languages, while generating more concise and abstract summaries.
SLURP-TN : Resource for Tunisian Dialect Spoken Language Understanding
Haroun Elleuch | Salima Mdhaffar | Yannick Estève | Fethi Bougares
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Haroun Elleuch | Salima Mdhaffar | Yannick Estève | Fethi Bougares
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Spoken Language Understanding (SLU) aims to extract the semantic information from the speech utterance of user queries. It is a core component in a task-oriented dialog system. With the spectacular progress of deep neural network models and the evolution of pre-trained language models, SLU has obtained significant breakthroughs. However, only a few high-resource languages have taken advantage of this progress due to the absence of SLU resources. In this paper, we seek to mitigate this obstacle by introducing SLURP-TN. This dataset was created by recording 55 native speakers uttering sentences in Tunisian dialect, manually translated from six SLURP domains. The result is an SLU Tunisian dialect dataset that comprises 4165 sentences recorded into around 5 hours of acoustic material. We also develop a number of Automatic Speech Recognition and SLU models exploiting SLUTP-TN. The Dataset and baseline models are available at: https://huggingface.co/datasets/Elyadata/SLURP-TN.
WhiteHouse: Translation of the Casablanca Corpus for Multi-dialectal Arabic Speech Translation
Fethi Bougares | Salima Mdhaffar | Yannick Estève
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Fethi Bougares | Salima Mdhaffar | Yannick Estève
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Remarkable progress has been made recently in the speech processing of Arabic dialects. This is primarily due to the availability of large multilingual pre-trained models as well as the development of multiple well-annotated datasets that support training, fine-tuning, and evaluation of various speech models. However, most existing research on Arabic speech processing did not consider Automatic Speech Translation (AST) and focused mainly on Dialect Identification (DI) and Automatic Speech Recognition (ASR) tasks. To address this gap, we introduce WhiteHouse, the first multi-dialectal Arabic-English Speech Translation Corpus. WhiteHouse supplements the recently created Casablanca dataset with English translation for each utterance in the transcripts. This results in a three-way parallel speech-transcription-translation multi-dialectal Arabic dataset. WhiteHouse dataset is used to evaluate various SoTA speech translation models. Our experiments show that SoTA speech translation models performs poorly when evaluated on Arabic dialectal conditions. All the data used during training and testing are released for public use and further improvements
2025
FFSTC 2: Extending the Fongbe to French Speech Translation Corpus
D. Fortuné KPONOU | Salima Mdhaffar | Fréjus A. A. Laleye | Eugène Cokou Ezin | Yannick Estève
Proceedings of the 22nd International Conference on Spoken Language Translation (IWSLT 2025)
D. Fortuné KPONOU | Salima Mdhaffar | Fréjus A. A. Laleye | Eugène Cokou Ezin | Yannick Estève
Proceedings of the 22nd International Conference on Spoken Language Translation (IWSLT 2025)
This paper introduced FFSTC 2, an expanded version of the existing Fongbe-to-French speech translation corpus, addressing the critical need for resources in African dialects for speech recognition and translation tasks. We extended the dataset by adding 36 hours of transcribed audio, bringing the total to 61 hours, thereby enhancing its utility for both automatic speech recognition (ASR) and speech translation (ST) in Fongbe, a low-resource language. Using this enriched corpus, we developed both cascade and end-to-end speech translation systems. Our models employ AfriHuBERT and HuBERT147, two speech encoders specialized to African languages, and the NLLB and mBART models as decoders. We also investigate the use of the SAMU-XLSR approach to inject sentence-level semantic information to the XSLR-128 model used as an alternative speech encoder. We also introduced a novel diacritic-substitution technique for ASR, which, when combined with NLLB, enables a cascade model to achieve a BLEU score of 37.23 ompared to 39.60 obtained by the best system using original diacritics. Among the end-to-end architectures evaluated, the architectures with data augmentation and NLLB as decoder achieved the highest score respectively, SAMU-NLLB scored the BLEU score of 28.43.
Findings of the IWSLT 2025 Evaluation Campaign
Idris Abdulmumin | Victor Agostinelli | Tanel Alumäe | Antonios Anastasopoulos | Luisa Bentivogli | Ondřej Bojar | Claudia Borg | Fethi Bougares | Roldano Cattoni | Mauro Cettolo | Lizhong Chen | William Chen | Raj Dabre | Yannick Estève | Marcello Federico | Mark Fishel | Marco Gaido | Dávid Javorský | Marek Kasztelnik | Fortuné Kponou | Mateusz Krubiński | Tsz Kin Lam | Danni Liu | Evgeny Matusov | Chandresh Kumar Maurya | John P. McCrae | Salima Mdhaffar | Yasmin Moslem | Kenton Murray | Satoshi Nakamura | Matteo Negri | Jan Niehues | Atul Kr. Ojha | John E. Ortega | Sara Papi | Pavel Pecina | Peter Polák | Piotr Połeć | Ashwin Sankar | Beatrice Savoldi | Nivedita Sethiya | Claytone Sikasote | Matthias Sperber | Sebastian Stüker | Katsuhito Sudoh | Brian Thompson | Marco Turchi | Alex Waibel | Patrick Wilken | Rodolfo Zevallos | Vilém Zouhar | Maike Züfle
Proceedings of the 22nd International Conference on Spoken Language Translation (IWSLT 2025)
Idris Abdulmumin | Victor Agostinelli | Tanel Alumäe | Antonios Anastasopoulos | Luisa Bentivogli | Ondřej Bojar | Claudia Borg | Fethi Bougares | Roldano Cattoni | Mauro Cettolo | Lizhong Chen | William Chen | Raj Dabre | Yannick Estève | Marcello Federico | Mark Fishel | Marco Gaido | Dávid Javorský | Marek Kasztelnik | Fortuné Kponou | Mateusz Krubiński | Tsz Kin Lam | Danni Liu | Evgeny Matusov | Chandresh Kumar Maurya | John P. McCrae | Salima Mdhaffar | Yasmin Moslem | Kenton Murray | Satoshi Nakamura | Matteo Negri | Jan Niehues | Atul Kr. Ojha | John E. Ortega | Sara Papi | Pavel Pecina | Peter Polák | Piotr Połeć | Ashwin Sankar | Beatrice Savoldi | Nivedita Sethiya | Claytone Sikasote | Matthias Sperber | Sebastian Stüker | Katsuhito Sudoh | Brian Thompson | Marco Turchi | Alex Waibel | Patrick Wilken | Rodolfo Zevallos | Vilém Zouhar | Maike Züfle
Proceedings of the 22nd International Conference on Spoken Language Translation (IWSLT 2025)
This paper presents the outcomes of the shared tasks conducted at the 22nd International Workshop on Spoken Language Translation (IWSLT). The workshop addressed seven critical challenges in spoken language translation: simultaneous and offline translation, automatic subtitling and dubbing, model compression, speech-to-speech translation, dialect and low-resource speech translation, and Indic languages. The shared tasks garnered significant participation, with 32 teams submitting their runs. The field’s growing importance is reflected in the increasing diversity of shared task organizers and contributors to this overview paper, representing a balanced mix of industrial and academic institutions. This broad participation demonstrates the rising prominence of spoken language translation in both research and practical applications.
TEDxTN: A Three-way Speech Translation Corpus for Code-Switched Tunisian Arabic - English
Fethi Bougares | Salima Mdhaffar | Haroun Elleuch | Yannick Estève
Proceedings of The Third Arabic Natural Language Processing Conference
Fethi Bougares | Salima Mdhaffar | Haroun Elleuch | Yannick Estève
Proceedings of The Third Arabic Natural Language Processing Conference
In this paper, we introduce TEDxTN, the first publicly available Tunisian Arabic to English speech translation dataset. This work is in line with the ongoing effort to mitigate the data scarcity obstacle for a number of Arabic dialects. We collected, segmented, transcribed and translated 108 TEDx talks following our internally developed annotations guidelines. The. collected talks represent 25 hours of speech with code-switching that cover speakers with various accents from over 11 different regions of Tunisia. We make the annotation guidelines and corpus publicly available. This will enable the extension of TEDxTN to new talks as they become available. We also report results for strong baseline systems of Speech Recognition and Speech Translation using multiple pre-trained and fine-tuned end-to-end models. This corpus is the first open source and publicly available speech translation corpus of Code-Switching Tunisian dialect. We believe that this is a valuable resource that can motivate and facilitate further research studying Tunisian Dialect.
ELYADATA & LIA at NADI 2025: ASR and ADI Subtasks
Haroun Elleuch | Youssef Saidi | Salima Mdhaffar | Yannick Estève | Fethi Bougares
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks
Haroun Elleuch | Youssef Saidi | Salima Mdhaffar | Yannick Estève | Fethi Bougares
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks
LIA and ELYADATA systems for the IWSLT 2025 low-resource speech translation shared task
Chaimae Chellaf | Haroun Elleuch | Othman Istaiteh | D. Fortuné KPONOU | Fethi Bougares | Yannick Estève | Salima Mdhaffar
Proceedings of the 22nd International Conference on Spoken Language Translation (IWSLT 2025)
Chaimae Chellaf | Haroun Elleuch | Othman Istaiteh | D. Fortuné KPONOU | Fethi Bougares | Yannick Estève | Salima Mdhaffar
Proceedings of the 22nd International Conference on Spoken Language Translation (IWSLT 2025)
In this paper, we present the approach and system setup of our participation in the IWSLT 2025 low-resource speech translation shared task. We submitted systems for three language pairs, namely Tunisian Arabic to English, North Levantine Arabic to English, and Fongbé to French. Both pipeline and end-to-end speech translation systems were explored for Tunisian Arabic to English and Fongbé to French pairs. However, only pipeline approaches were investigated for the North Levantine Arabic–English translation direction. All our submissions are based on the usage of pre-trained models that we further fine-tune with the shared task training data.
2024
Performance Analysis of Speech Encoders for Low-Resource SLU and ASR in Tunisian Dialect
Salima Mdhaffar | Haroun Elleuch | Fethi Bougares | Yannick Estève
Proceedings of the Second Arabic Natural Language Processing Conference
Salima Mdhaffar | Haroun Elleuch | Fethi Bougares | Yannick Estève
Proceedings of the Second Arabic Natural Language Processing Conference
Speech encoders pretrained through self-supervised learning (SSL) have demonstrated remarkable performance in various downstream tasks, including Spoken Language Understanding (SLU) and Automatic Speech Recognition (ASR). For instance, fine-tuning SSL models for such tasks has shown significant potential, leading to improvements in the SOTA performance across challenging datasets.In contrast to existing research, this paper contributes by comparing the effectiveness of SSL approaches in the context of (i) the low-resource Spoken Tunisian Arabic Dialect and (ii) its combination with a low-resource SLU and ASR scenario, where only a few semantic annotations are available for fine-tuning. We conducted experiments using many SSL speech encoders on the TARIC-SLU dataset. We used speech encoders that were pre-trained on either monolingual or multilingual speech data. Some of them have also been refined without in-domain nor Tunisian data through a multimodal supervised teacher-student learning. The study made in this paper yields numerous significant findings that we will discuss in the paper.
TARIC-SLU: A Tunisian Benchmark Dataset for Spoken Language Understanding
Salima Mdhaffar | Fethi Bougares | Renato de Mori | Salah Zaiem | Mirco Ravanelli | Yannick Estève
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Salima Mdhaffar | Fethi Bougares | Renato de Mori | Salah Zaiem | Mirco Ravanelli | Yannick Estève
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
In recent years, there has been a significant increase in interest in developing Spoken Language Understanding (SLU) systems. SLU involves extracting a list of semantic information from the speech signal. A major issue for SLU systems is the lack of sufficient amount of bi-modal (audio and textual semantic annotation) training data. Existing SLU resources are mainly available in high-resource languages such as English, Mandarin and French. However, one of the current challenges concerning low-resourced languages is data collection and annotation. In this work, we present a new freely available corpus, named TARIC-SLU, composed of railway transport conversations in Tunisian dialect that is continuously annotated in dialogue acts and slots. We describe the semantic model of the dataset, the data and experiments conducted to build ASR-based and SLU-based baseline models. To facilitate its use, a complete recipe, including data preparation, training and evaluation scripts, has been built and will be integrated to SpeechBrain, a popular open-source conversational AI toolkit based on PyTorch.
Sonos Voice Control Bias Assessment Dataset: A Methodology for Demographic Bias Assessment in Voice Assistants
Chloe Sekkat | Fanny Leroy | Salima Mdhaffar | Blake Perry Smith | Yannick Estève | Joseph Dureau | Alice Coucke
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Chloe Sekkat | Fanny Leroy | Salima Mdhaffar | Blake Perry Smith | Yannick Estève | Joseph Dureau | Alice Coucke
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Recent works demonstrate that voice assistants do not perform equally well for everyone, but research on demographic robustness of speech technologies is still scarce. This is mainly due to the rarity of large datasets with controlled demographic tags. This paper introduces the Sonos Voice Control Bias Assessment Dataset, an open dataset composed of voice assistant requests for North American English in the music domain (1,038 speakers, 166 hours, 170k audio samples, with 9,040 unique labelled transcripts) with a controlled demographic diversity (gender, age, dialectal region and ethnicity). We also release a statistical demographic bias assessment methodology, at the univariate and multivariate levels, tailored to this specific use case and leveraging spoken language understanding metrics rather than transcription accuracy, which we believe is a better proxy for user experience. To demonstrate the capabilities of this dataset and statistical method to detect demographic bias, we consider a pair of state-of-the-art Automatic Speech Recognition and Spoken Language Understanding models. Results show statistically significant differences in performance across age, dialectal region and ethnicity. Multivariate tests are crucial to shed light on mixed effects between dialectal region, gender and age.
2023
ON-TRAC Consortium Systems for the IWSLT 2023 Dialectal and Low-resource Speech Translation Tasks
Antoine Laurent | Souhir Gahbiche | Ha Nguyen | Haroun Elleuch | Fethi Bougares | Antoine Thiol | Hugo Riguidel | Salima Mdhaffar | Gaëlle Laperrière | Lucas Maison | Sameer Khurana | Yannick Estève
Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)
Antoine Laurent | Souhir Gahbiche | Ha Nguyen | Haroun Elleuch | Fethi Bougares | Antoine Thiol | Hugo Riguidel | Salima Mdhaffar | Gaëlle Laperrière | Lucas Maison | Sameer Khurana | Yannick Estève
Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)
This paper describes the ON-TRAC consortium speech translation systems developed for IWSLT 2023 evaluation campaign. Overall, we participated in three speech translation tracks featured in the low-resource and dialect speech translation shared tasks, namely; i) spoken Tamasheq to written French, ii) spoken Pashto to written French, and iii) spoken Tunisian to written English. All our primary submissions are based on the end-to-end speech-to-text neural architecture using a pretrained SAMU-XLSR model as a speech encoder and a mbart model as a decoder. The SAMU-XLSR model is built from the XLS-R 128 in order to generate language agnostic sentence-level embeddings. This building is driven by the LaBSE model trained on multilingual text dataset. This architecture allows us to improve the input speech representations and achieve significant improvements compared to conventional end-to-end speech translation systems.
2022
The Spoken Language Understanding MEDIA Benchmark Dataset in the Era of Deep Learning: data updates, training and evaluation tools
Gaëlle Laperrière | Valentin Pelloin | Antoine Caubrière | Salima Mdhaffar | Nathalie Camelin | Sahar Ghannay | Bassam Jabaian | Yannick Estève
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Gaëlle Laperrière | Valentin Pelloin | Antoine Caubrière | Salima Mdhaffar | Nathalie Camelin | Sahar Ghannay | Bassam Jabaian | Yannick Estève
Proceedings of the Thirteenth Language Resources and Evaluation Conference
With the emergence of neural end-to-end approaches for spoken language understanding (SLU), a growing number of studies have been presented during these last three years on this topic. The major part of these works addresses the spoken language understanding domain through a simple task like speech intent detection. In this context, new benchmark datasets have also been produced and shared with the community related to this task. In this paper, we focus on the French MEDIA SLU dataset, distributed since 2005 and used as a benchmark dataset for a large number of research works. This dataset has been shown as being the most challenging one among those accessible to the research community. Distributed by ELRA, this corpus is free for academic research since 2019. Unfortunately, the MEDIA dataset is not really used beyond the French research community. To facilitate its use, a complete recipe, including data preparation, training and evaluation scripts, has been built and integrated to SpeechBrain, an already popular open-source and all-in-one conversational AI toolkit based on PyTorch. This recipe is presented in this paper. In addition, based on the feedback of some researchers who have worked on this dataset for several years, some corrections have been brought to the initial manual annotation: the new version of the data will also be integrated into the ELRA catalogue, as the original one. More, a significant amount of data collected during the construction of the MEDIA corpus in the 2000s was never used until now: we present the first results reached on this subset — also included in the MEDIA SpeechBrain recipe — , that will be used for now as the MEDIA test2. Last, we discuss evaluation issues.
Impact Analysis of the Use of Speech and Language Models Pretrained by Self-Supersivion for Spoken Language Understanding
Salima Mdhaffar | Valentin Pelloin | Antoine Caubrière | Gaëlle Laperriere | Sahar Ghannay | Bassam Jabaian | Nathalie Camelin | Yannick Estève
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Salima Mdhaffar | Valentin Pelloin | Antoine Caubrière | Gaëlle Laperriere | Sahar Ghannay | Bassam Jabaian | Nathalie Camelin | Yannick Estève
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Pretrained models through self-supervised learning have been recently introduced for both acoustic and language modeling. Applied to spoken language understanding tasks, these models have shown their great potential by improving the state-of-the-art performances on challenging benchmark datasets. In this paper, we present an error analysis reached by the use of such models on the French MEDIA benchmark dataset, known as being one of the most challenging benchmarks for the slot filling task among all the benchmarks accessible to the entire research community. One year ago, the state-of-art system reached a Concept Error Rate (CER) of 13.6% through the use of a end-to-end neural architecture. Some months later, a cascade approach based on the sequential use of a fine-tuned wav2vec2.0 model and a fine-tuned BERT model reaches a CER of 11.2%. This significant improvement raises questions about the type of errors that remain difficult to treat, but also about those that have been corrected using these models pre-trained through self-supervision learning on a large amount of data. This study brings some answers in order to better understand the limits of such models and open new perspectives to continue improving the performance.
2021
ON-TRAC’ systems for the IWSLT 2021 low-resource speech translation and multilingual speech translation shared tasks
Hang Le | Florentin Barbier | Ha Nguyen | Natalia Tomashenko | Salima Mdhaffar | Souhir Gabiche Gahbiche | Benjamin Lecouteux | Didier Schwab | Yannick Estève
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)
Hang Le | Florentin Barbier | Ha Nguyen | Natalia Tomashenko | Salima Mdhaffar | Souhir Gabiche Gahbiche | Benjamin Lecouteux | Didier Schwab | Yannick Estève
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)
This paper describes the ON-TRAC Consortium translation systems developed for two challenge tracks featured in the Evaluation Campaign of IWSLT 2021, low-resource speech translation and multilingual speech translation. The ON-TRAC Consortium is composed of researchers from three French academic laboratories and an industrial partner: LIA (Avignon Université), LIG (Université Grenoble Alpes), LIUM (Le Mans Université), and researchers from Airbus. A pipeline approach was explored for the low-resource speech translation task, using a hybrid HMM/TDNN automatic speech recognition system fed by wav2vec features, coupled to an NMT system. For the multilingual speech translation task, we investigated the us of a dual-decoder Transformer that jointly transcribes and translates an input speech. This model was trained in order to translate from multiple source languages to multiple target ones.
2020
A Multimodal Educational Corpus of Oral Courses: Annotation, Analysis and Case Study
Salima Mdhaffar | Yannick Estève | Antoine Laurent | Nicolas Hernandez | Richard Dufour | Delphine Charlet | Geraldine Damnati | Solen Quiniou | Nathalie Camelin
Proceedings of the Twelfth Language Resources and Evaluation Conference
Salima Mdhaffar | Yannick Estève | Antoine Laurent | Nicolas Hernandez | Richard Dufour | Delphine Charlet | Geraldine Damnati | Solen Quiniou | Nathalie Camelin
Proceedings of the Twelfth Language Resources and Evaluation Conference
This corpus is part of the PASTEL (Performing Automated Speech Transcription for Enhancing Learning) project aiming to explore the potential of synchronous speech transcription and application in specific teaching situations. It includes 10 hours of different lectures, manually transcribed and segmented. The main interest of this corpus lies in its multimodal aspect: in addition to speech, the courses were filmed and the written presentation supports (slides) are made available. The dataset may then serve researches in multiple fields, from speech and language to image and video processing. The dataset will be freely available to the research community. In this paper, we first describe in details the annotation protocol, including a detailed analysis of the manually labeled data. Then, we propose some possible use cases of the corpus with baseline results. The use cases concern scientific fields from both speech and text processing, with language model adaptation, thematic segmentation and transcription to slide alignment.
2019
Apport de l’adaptation automatique des modèles de langage pour la reconnaissance de la parole: évaluation qualitative extrinsèque dans un contexte de traitement de cours magistraux (Contribution of automatic adaptation of language models for speech recognition : extrinsic qualitative evaluation in a context of educational courses)
Salima Mdhaffar | Yannick Estève | Nicolas Hernandez | Antoine Laurent | Solen Quiniou
Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Volume II : Articles courts
Salima Mdhaffar | Yannick Estève | Nicolas Hernandez | Antoine Laurent | Solen Quiniou
Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Volume II : Articles courts
Malgré les faiblesses connues de cette métrique, les performances de différents systèmes de reconnaissance automatique de la parole sont généralement comparées à l’aide du taux d’erreur sur les mots. Les transcriptions automatiques de ces systèmes sont de plus en plus exploitables et utilisées dans des systèmes complexes de traitement automatique du langage naturel, par exemple pour la traduction automatique, l’indexation, la recherche documentaire... Des études récentes ont proposé des métriques permettant de comparer la qualité des transcriptions automatiques de différents systèmes en fonction de la tâche visée. Dans cette étude nous souhaitons mesurer, qualitativement, l’apport de l’adaptation automatique des modèles de langage au domaine visé par un cours magistral. Les transcriptions du discours de l’enseignant peuvent servir de support à la navigation dans le document vidéo du cours magistral ou permettre l’enrichissement de son contenu pédagogique. C’est à-travers le prisme de ces deux tâches que nous évaluons l’apport de l’adaptation du modèle de langage. Les expériences ont été menées sur un corpus de cours magistraux et montrent combien le taux d’erreur sur les mots est une métrique insuffisante qui masque les apports effectifs de l’adaptation des modèles de langage.
2018
Le corpus PASTEL pour le traitement automatique de cours magistraux (PASTEL corpus for automatic processing of lectures)
Salima Mdhaffar | Antoine Laurent | Yannick Estève
Actes de la Conférence TALN. Volume 1 - Articles longs, articles courts de TALN
Salima Mdhaffar | Antoine Laurent | Yannick Estève
Actes de la Conférence TALN. Volume 1 - Articles longs, articles courts de TALN
Le projet PASTEL étudie l’acceptabilité et l’utilisabilité des transcriptions automatiques dans le cadre d’enseignements magistraux. Il s’agit d’outiller les apprenants pour enrichir de manière synchrone et automatique les informations auxquelles ils peuvent avoir accès durant la séance. Cet enrichissement s’appuie sur des traitements automatiques du langage naturel effectués sur les transcriptions automatiques. Nous présentons dans cet article un travail portant sur l’annotation d’enregistrements de cours magistraux enregistrés dans le cadre du projet CominOpenCourseware. Ces annotations visent à effectuer des expériences de transcription automatique, segmentation thématique, appariement automatique en temps réel avec des ressources externes... Ce corpus comprend plus de neuf heures de parole annotées. Nous présentons également des expériences préliminaires réalisées pour évaluer l’adaptation automatique de notre système de reconnaissance de la parole.
Search
Fix author
Co-authors
- Yannick Estève 19
- Fethi Bougares 9
- Haroun Elleuch 6
- Antoine Laurent 4
- Nathalie Camelin 3
- Gaëlle Laperrière 3
- Valentin Pelloin 3
- Antoine Caubrière 2
- Sahar Ghannay 2
- Nicolas Hernandez 2
- Bassam Jabaian 2
- D. Fortuné Kponou 2
- Benjamin Lecouteux 2
- Ha Nguyen 2
- Solen Quiniou 2
- Didier Schwab 2
- Idris Abdulmumin 1
- Victor Agostinelli 1
- Tanel Alumäe 1
- Antonios Anastasopoulos 1
- Alexandre Daniel Audibert 1
- Florentin Barbier 1
- Luisa Bentivogli 1
- Ondřej Bojar 1
- Claudia Borg 1
- Maryem Bouziane 1
- Marie Candito 1
- Roldano Cattoni 1
- Mauro Cettolo 1
- Delphine Charlet 1
- Arnault Chatelain 1
- Chaimae Chellaf 1
- Chaimae Chellaf El Hammoud 1
- Lizhong Chen 1
- William Chen 1
- Maximin Coavoux 1
- Alice Coucke 1
- Raj Dabre 1
- Géraldine Damnati 1
- Renato De Mori 1
- Nils Defauw 1
- Marco Dinarelli 1
- Richard Dufour 1
- Joseph Dureau 1
- Solène Evain 1
- Eugène Cokou Ezin 1
- Diandra Fabre 1
- Marcello Federico 1
- Mark Fishel 1
- Souhir Gabiche Gahbiche 1
- Souhir Gahbiche 1
- Marco Gaido 1
- Mohammed Ghennai 1
- Lorraine Goeuriot 1
- Shuyue Gu 1
- Qianwen Guan 1
- Nicolas Hervé 1
- Stéphane Huet 1
- Othman Istaiteh 1
- Dávid Javorský 1
- Marek Kasztelnik 1
- Sameer Khurana 1
- Fortuné Kponou 1
- Mateusz Krubiński 1
- Steffen Lalande 1
- Fréjus A. A. Laleye 1
- Tsz Kin Lam 1
- Phuong-Hang Le 1
- Hang Le 1
- Fanny Leroy 1
- Danni Liu 1
- Lucas Maison 1
- Aidan Mannion 1
- Evgeny Matusov 1
- Chandresh Kumar Maurya 1
- John Philip McCrae 1
- Kirill Milintsevich 1
- Yasmin Moslem 1
- Kenton Murray 1
- Satoshi Nakamura 1
- Aurélie Nardy 1
- Matteo Negri 1
- Jan Niehues 1
- Atul Kr. Ojha 1
- Etienne Ollion 1
- John E. Ortega 1
- Sara Papi 1
- Pavel Pecina 1
- Maxime Peyrard 1
- Peter Polák 1
- François Portet 1
- Piotr Połeć 1
- Mirco Ravanelli 1
- Hugo Riguidel 1
- Solange Rossato 1
- Youssef Saidi 1
- Ashwin Sankar 1
- Beatrice Savoldi 1
- Vincent Segonne 1
- Chloe Sekkat 1
- Nivedita Sethiya 1
- Claytone Sikasote 1
- Blake Perry Smith 1
- Matthias Sperber 1
- Sebastian Stüker 1
- Katsuhito Sudoh 1
- Gilles Sérasset 1
- Antoine Thiol 1
- Brian Thompson 1
- Natalia Tomashenko 1
- Marco Turchi 1
- Alex Waibel 1
- Patrick Wilken 1
- Salah Zaiem 1
- Rodolfo Zevallos 1
- Vilém Zouhar 1
- Maike Züfle 1