Wafa Aissa


2025

pdf bib
Assessing French Readability for Adults with Low Literacy: A Global and Local Perspective
Wafa Aissa | Thibault Bañeras-Roux | Elodie Vanzeveren | Lingyun Gao | Rodrigo Wilkens | Thomas François
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

This study presents a novel approach to assessing French text readability for adults with low literacy skills, addressing both global (full-text) and local (segment-level) difficulty. We introduce a dataset of 461 texts annotated using a difficulty scale developed specifically for this population. Using this corpus, we conducted a systematic comparison of key readability modeling approaches, including machine learning techniques based on linguistic variables, fine-tuning of CamemBERT, a hybrid approach combining CamemBERT with linguistic variables, and the use of generative language models (LLMs) to carry out readability assessment at both global and local levels.

pdf bib
The iRead4Skills Intelligent Complexity Analyzer
Wafa Aissa | Raquel Amaro | David Antunes | Thibault Bañeras-Roux | Jorge Baptista | Alejandro Catala | Luís Correia | Thomas François | Marcos Garcia | Mario Izquierdo-Álvarez | Nuno Mamede | Vasco Martins | Miguel Neves | Eugénio Ribeiro | Sandra Rodriguez Rey | Elodie Vanzeveren
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

We present the iRead4Skills Intelligent Complexity Analyzer, an open-access platform specifically designed to assist educators and content developers in addressing the needs of low-literacy adults by analyzing and diagnosing text complexity. This multilingual system integrates a range of Natural Language Processing (NLP) components to assess input texts along multiple levels of granularity and linguistic dimensions in Portuguese, Spanish, and French. It assigns four tailored difficulty levels using state-of-the-art models, and introduces four diagnostic yardsticks—textual structure, lexicon, syntax, and semantics—offering users actionable feedback on specific dimensions of textual complexity. Each component of the system is supported by experiments comparing alternative models on manually annotated data.

pdf bib
Modélisation de la lisibilité en français pour les personnes en situation d’illettrisme
Wafa Aissa | Thibault Bañeras-Roux | Elodie Vanzeveren | Lingyun Gao | Alice Pintard | Rodrigo Wilkens | Thomas François
Actes des 32ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 1 : articles scientifiques originaux

Nous présentons une nouvelle formule de lisibilité en français spécifiquement conçue pour les personnes en situation d’illettrisme. À cette fin, nous avons construit un corpus de 461 textes annotés selon une échelle de difficulté spécialisée à ce public. Dans un second temps, nous avons systématiquement comparé les principales approches en lisibilité, incluant l’apprentissage automatique reposant sur des variables linguistiques, le fine-tuning de CamemBERT, une approche hybride combinant CamemBERT et des variables linguistiques et des modèles de langue génératifs (LLMs). Une analyse approfondie de ces modèles et de leurs performances est menée afin d’évaluer leur applicabilité dans des contextes réels.

2018

pdf bib
A Reinforcement Learning-driven Translation Model for Search-Oriented Conversational Systems
Wafa Aissa | Laure Soulier | Ludovic Denoyer
Proceedings of the 2018 EMNLP Workshop SCAI: The 2nd International Workshop on Search-Oriented Conversational AI

Search-oriented conversational systems rely on information needs expressed in natural language (NL). We focus here on the understanding of NL expressions for building keyword-based queries. We propose a reinforcement-learning-driven translation model framework able to 1) learn the translation from NL expressions to queries in a supervised way, and, 2) to overcome the lack of large-scale dataset by framing the translation model as a word selection approach and injecting relevance feedback as a reward in the learning process. Experiments are carried out on two TREC datasets. We outline the effectiveness of our approach.