Diandra Fabre


2026

Current evaluation practices in speech processing systems often overlook the diversity of spoken accents, leading to significant performance disparities across speaker groups. This issue largely comes from biases and imbalances in training corpora, and is further compounded by the scarcity of open-source datasets suitable for evaluating accent variability in French. To address this gap, we extend the CFPR dataset with explicit accent labels, providing a new benchmark for assessing the robustness of speech technology systems across diverse French accents. We additionally conduct a perceptual study with 87 human participants to evaluate the reliability and interpretability of these labels. Using this resource, we evaluated an eight-class French accent classifier trained on Common Voice data. The first results highlight both the complexity of automatic French accent recognition in low-resource settings, and the difficulty for French-speakers to perceive all the linguistic variabilities in French-speaking countries.
This position paper argues that the under-representation of social science tasks in contemporary LLM benchmarks limits advances in both LLM evaluation and social scientific inquiry. Benchmarks — standardized tools for assessing computational systems — are pivotal in the development of artificial intelligence (AI), including large language models (LLMs). Benchmarks do more than measure progress — they actively structure it, shaping reputations, research agendas, and commercial outcomes. Despite this central role, the social sciences are largely absent from mainstream evaluation frameworks, even though scholars in these fields generate dozens of rigorously annotated, context-sensitive datasets each year. Integrating this work into benchmark design could significantly improve the generalization and robustness of AI models. In turn, models trained on social scientific tasks would likely yield better performance on classic and contemporary tasks in disciplines as diverse as history, sociology, political science or economics. This is all the more pressing as these disciplines are quickly turning to LLMs for assistance. To address this gap, we introduce BenCSSmark, a benchmark composed of datasets annotated by computational social scientists. By integrating social scientific perspectives into benchmarking, BenCSSmark seeks to promote more robust, transparent, and socially relevant AI systems and to foster efficient collaboration.
We release Pantagruel models, a new family of self-supervised encoder models for French text and speech. Instead of predicting modality-tailored targets such as textual tokens or speech units, Pantagruel learns contextualized target representations in the feature space, allowing modality-specific encoders to capture linguistic and acoustic regularities more effectively. Separate models are pre-trained on large-scale French corpora, including Wikipedia, OSCAR and CroissantLLM for text, together with MultilingualLibriSpeech, LeBenchmark, and INA-100k for speech. INA-100k is a newly introduced 100,000-hour corpus of French audio derived from the archives of the Institut National de l’Audiovisuel (INA), the national repository of French radio and television broadcasts, providing highly diverse audio data. We evaluate Pantagruel across a broad range of downstream tasks spanning both modalities, including those from the standard French benchmarks such as FLUE or LeBenchmark. Across these tasks, Pantagruel models show competitive or superior performance compared to strong French baselines such as CamemBERT, FlauBERT, and LeBenchmark2.0, while maintaining a shared architecture that can seamlessly handle either speech or text inputs. These results confirm the effectiveness of feature-space self-supervised objectives for French representation learning and highlight Pantagruel as a robust foundation for multimodal speech-text understanding.

2025

Dans cet article, nous présentons une étude sur la problématique de l’alignement automatique des données dans un corpus constitué de discours en français parlé, sous-titrés en français écrit et interprétés en langue des signes française (LSF). Après une introduction précisant le processus bien particulier de l’interprétation en langue des signes, nous dressons un tour d’horizon des ensembles de données existants pour la LSF ainsi que les spécificités du corpus Matignon-LSF, constitué à partir des comptes-rendus vidéos hebdomadaires du conseil des ministres. Nous montrons ensuite sur quelques exemples certains des phénomènes observés sur la problématique de l’alignement temporel entre les sous-titres synchronisés avec l’audio, et la LSF interprétée qui subit un décalage temporel. Nous en concluons que le niveau d’alignement ne peut pas être celui des phrases en français écrit et proposons quelques pistes pour la suite.
Dans cet article, nous présentons SuperGPQA-HCE-FR, une adaptation française d’un sous-ensemble du benchmark SuperGPQA axé sur les domaines de l’ingénierie hydraulique et du génie civil. Il comprend 285 questions à choix multiples conçues pour évaluer et spécialiser des modèles de langue multilingues de grande taille (LLMs) sur des tâches techniques. La traduction réalisée automatiquement est ensuite évaluée par des experts des domaines. Enfin, nous présentons les premiers résultats sur des modèles Instruct généralistes multilingues en comparant les performances du corpus original en anglais à celles du corpus traduit en français.

2024