Nuria Gala

Also published as: Nùria Gala, Núria Gala


HECTOR: A Hybrid TExt SimplifiCation TOol for Raw Texts in French
Amalia Todirascu | Rodrigo Wilkens | Eva Rolin | Thomas François | Delphine Bernhard | Núria Gala
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Reducing the complexity of texts by applying an Automatic Text Simplification (ATS) system has been sparking interest inthe area of Natural Language Processing (NLP) for several years and a number of methods and evaluation campaigns haveemerged targeting lexical and syntactic transformations. In recent years, several studies exploit deep learning techniques basedon very large comparable corpora. Yet the lack of large amounts of corpora (original-simplified) for French has been hinderingthe development of an ATS tool for this language. In this paper, we present our system, which is based on a combination ofmethods relying on word embeddings for lexical simplification and rule-based strategies for syntax and discourse adaptations. We present an evaluation of the lexical, syntactic and discourse-level simplifications according to automatic and humanevaluations. We discuss the performances of our system at the lexical, syntactic, and discourse levels

pdf bib
Proceedings of the 2nd Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI) within the 13th Language Resources and Evaluation Conference
Rodrigo Wilkens | David Alfter | Rémi Cardon | Núria Gala
Proceedings of the 2nd Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI) within the 13th Language Resources and Evaluation Conference

HIBOU: an eBook to improve Text Comprehension and Reading Fluency for Beginning Readers of French
Ludivine Javourey Drevet | Stéphane Dufau | Johannes Christoph Ziegler | Núria Gala
Proceedings of the 2nd Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI) within the 13th Language Resources and Evaluation Conference

In this paper, we present HIBOU, an eBook application initially developed for iOs, displaying adapted texts (i.e. simplified), and proposing text comprehension activities. The application has been used in six elementary schools in France to evaluate and train reading fluency and comprehension skills on beginning readers of French. HIBOU displays two versions of French literary and documentary texts from the ALECTOR corpus, the ‘original’, and a simplified version. Text simplifications have been manually performed at the lexical, syntactic, and discursive levels. The child can read in autonomy and has access to different games on word identification. HIBOU is at present being developed to be online in a platform that will be available at elementary schools in France.

PADDLe: a Platform to Identify Complex Words for Learners of French as a Foreign Language (FFL)
Camille Pirali | Thomas François | Núria Gala
Proceedings of the 2nd Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI) within the 13th Language Resources and Evaluation Conference

Annotations of word difficulty by readers provide invaluable insights into lexical complexity. Yet, there is currently a paucity of tools allowing researchers to gather such annotations in an adaptable and simple manner. This article presents PADDLe, an online platform aiming to fill that gap and designed to encourage best practices when collecting difficulty judgements. Studies crafted using the tool ask users to provide a selection of demographic information, then to annotate a certain number of texts and answer multiple-choice comprehension questions after each text. Researchers are encouraged to use a multi-level annotation scheme, to avoid the drawbacks of binary complexity annotations. Once a study is launched, its results are summarised in a visual representation accessible both to researchers and teachers, and can be downloaded in .csv format. Some findings of a pilot study designed with the tool are also provided in the article, to give an idea of the types of research questions it allows to answer.


pdf bib
Proceedings of the 1st Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI)
Núria Gala | Rodrigo Wilkens
Proceedings of the 1st Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI)

Text Simplification to Help Individuals with Low Vision Read More Fluently
Lauren Sauvan | Natacha Stolowy | Carlos Aguilar | Thomas François | Núria Gala | Frédéric Matonti | Eric Castet | Aurélie Calabrèse
Proceedings of the 1st Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI)

The objective of this work is to introduce text simplification as a potential reading aid to help improve the poor reading performance experienced by visually impaired individuals. As a first step, we explore what makes a text especially complex when read with low vision, by assessing the individual effect of three word properties (frequency, orthographic similarity and length) on reading speed in the presence of Central visual Field Loss (CFL). Individuals with bilateral CFL induced by macular diseases read pairs of French sentences displayed with the self-paced reading method. For each sentence pair, sentence n contained a target word matched with a synonym word of the same length included in sentence n+1. Reading time was recorded for each target word. Given the corpus we used, our results show that (1) word frequency has a significant effect on reading time (the more frequent the faster the reading speed) with larger amplitude (in the range of seconds) compared to normal vision; (2) word neighborhood size has a significant effect on reading time (the more neighbors the slower the reading speed), this effect being rather small in amplitude, but interestingly reversed compared to normal vision; (3) word length has no significant effect on reading time. Supporting the development of new and more effective assistive technology to help low vision is an important and timely issue, with massive potential implications for social and rehabilitation practices. The end goal of this project will be to use our findings to custom text simplification to this specific population and use it as an optimal and efficient reading aid.

Identifying Abstract and Concrete Words in French to Better Address Reading Difficulties
Daria Goriachun | Núria Gala
Proceedings of the 1st Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI)

Literature in psycholinguistics and neurosciences has showed that abstract and concrete concepts are perceived differently by our brain, and that the abstractness of a word can cause difficulties in reading. In order to integrate this parameter into an automatic text simplification (ATS) system for French readers, an annotated list with 7,898 abstract and concrete nouns has been semi-automatically developed. Our aim was to obtain abstract and concrete nouns from an initial manually annotated short list by using two distributional approaches: nearest neighbors and syntactic co-occurrences. The results of this experience have enabled to shed light on the different behaviors of concrete and abstract nouns in context. Besides, the final list, a resource per se in French available on demand, provides a valuable contribution since annotated resources based on cognitive variables such as concreteness or abstractness are scarce and very difficult to obtain. In future work, the list will be enlarged and integrated into an existing lexicon with ranked synonyms for the identification of complex words in text simplification applications.

Alector: A Parallel Corpus of Simplified French Texts with Alignments of Misreadings by Poor and Dyslexic Readers
Núria Gala | Anaïs Tack | Ludivine Javourey-Drevet | Thomas François | Johannes C. Ziegler
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this paper, we present a new parallel corpus addressed to researchers, teachers, and speech therapists interested in text simplification as a means of alleviating difficulties in children learning to read. The corpus is composed of excerpts drawn from 79 authentic literary (tales, stories) and scientific (documentary) texts commonly used in French schools for children aged between 7 to 9 years old. The excerpts were manually simplified at the lexical, morpho-syntactic, and discourse levels in order to propose a parallel corpus for reading tests and for the development of automatic text simplification tools. A sample of 21 poor-reading and dyslexic children with an average reading delay of 2.5 years read a portion of the corpus. The transcripts of readings errors were integrated into the corpus with the goal of identifying lexical difficulty in the target population. By means of statistical testing, we provide evidence that the manual simplifications significantly reduced reading errors, highlighting that the words targeted for simplification were not only well-chosen but also substituted with substantially easier alternatives. The entire corpus is available for consultation through a web interface and available on demand for research purposes.

The Nisvai Corpus of Oral Narrative Practices from Malekula (Vanuatu) and its Associated Language Resources
Jocelyn Aznar | Núria Gala
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this paper, we present a corpus of oral narratives from the Nisvai linguistic community and four associated language resources. Nisvai is an oral language spoken by 200 native speakers in the South-East of Malekula, an island of Vanuatu, Oceania. This language had never been the focus of a research before the one leading to this article. The corpus we present is made of 32 annotated narratives segmented into intonation units. The audio records were transcribed using the written conventions specifically developed for the language and translated into French. Four associated language resources have been generated by organizing the annotations into written documents: two of them are available online and two in paper format. The online resources allow the users to listen to the audio recordings whilereading the annotations. They were built to share the results of our fieldwork and to communicate on the Nisvai narrative practices with the researchers as well as with a more general audience. The bilingual paper resources, a booklet of narratives and a Nisvai-French French-Nisvai lexicon, were designed for the Nisvai community by taking into account their future uses (i.e. primary school).


Assisted Lexical Simplification for French Native Children with Reading Difficulties
Firas Hmida | Mokhtar B. Billami | Thomas François | Núria Gala
Proceedings of the 1st Workshop on Automatic Text Adaptation (ATA)

ReSyf: a French lexicon with ranked synonyms
Mokhtar B. Billami | Thomas François | Núria Gala
Proceedings of the 27th International Conference on Computational Linguistics

In this article, we present ReSyf, a lexical resource of monolingual synonyms ranked according to their difficulty to be read and understood by native learners of French. The synonyms come from an existing lexical network and they have been semantically disambiguated and refined. A ranking algorithm, based on a wide range of linguistic features and validated through an evaluation campaign with human annotators, automatically sorts the synonyms corresponding to a given word sense by reading difficulty. ReSyf is freely available and will be integrated into a web platform for reading assistance. It can also be applied to perform lexical simplification of French texts.


Création et validation de signatures sémantiques : application à la mesure de similarité sémantique et à la substitution lexicale (Creating and validating semantic signatures : application for measuring semantic similarity and lexical substitution)
Mokhtar-Boumedyen Billami | Núria Gala
Actes des 24ème Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 - Articles longs

L’intégration de la notion de similarité sémantique entre les unités lexicales est essentielle dans différentes applications de Traitement Automatique des Langues (TAL). De ce fait, elle a reçu un intérêt considérable qui a eu comme conséquence le développement d’une vaste gamme d’approches pour en déterminer une mesure. Ainsi, plusieurs types de mesures de similarité existent, elles utilisent différentes représentations obtenues à partir d’informations soit dans des ressources lexicales, soit dans de gros corpus de données ou bien dans les deux. Dans cet article, nous nous intéressons à la création de signatures sémantiques décrivant des représentations vectorielles de mots à partir du réseau lexical JeuxDeMots (JDM). L’évaluation de ces signatures est réalisée sur deux tâches différentes : mesures de similarité sémantique et substitution lexicale. Les résultats obtenus sont très satisfaisants et surpassent, dans certains cas, les performances des systèmes de l’état de l’art.


pdf bib
Bleu, contusion, ecchymose : tri automatique de synonymes en fonction de leur difficulté de lecture et compréhension (Automatic ranking of synonyms according to their reading and comprehension difficulty)
Thomas Francois | Mokhtar B. Billami | Núria Gala | Delphine Bernhard
Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 2 : TALN (Articles longs)

La lisibilité d’un texte dépend fortement de la difficulté des unités lexicales qui le composent. La simplification lexicale vise ainsi à remplacer les termes complexes par des équivalents sémantiques plus simples à comprendre : par exemple, BLEU (‘résultat d’un choc’) est plus simple que CONTUSION ou ECCHYMOSE. Il est pour cela nécessaire de disposer de ressources qui listent des synonymes pour des sens donnés et les trient par ordre de difficulté. Cet article décrit une méthode pour constituer une ressource de ce type pour le français. Les listes de synonymes sont extraites de BabelNet et de JeuxDeMots, puis triées grâce à un algorithme statistique d’ordonnancement. Les résultats du tri sont évalués par rapport à 36 listes de synonymes ordonnées manuellement par quarante annotateurs.

Reducing lexical complexity as a tool to increase text accessibility for children with dyslexia
Núria Gala | Johannes Ziegler
Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC)

Lexical complexity plays a central role in readability, particularly for dyslexic children and poor readers because of their slow and laborious decoding and word recognition skills. Although some features to aid readability may be common to most languages (e.g., the majority of ‘easy’ words are of low frequency), we believe that lexical complexity is mainly language-specific. In this paper, we define lexical complexity for French and we present a pilot study on the effects of text simplification in dyslexic children. The participants were asked to read out loud original and manually simplified versions of a standardized French text corpus and to answer comprehension questions after reading each text. The analysis of the results shows that the simplifications performed were beneficial in terms of reading speed and they reduced the number of reading errors (mainly lexical ones) without a loss in comprehension. Although the number of participants in this study was rather small (N=10), the results are promising and contribute to the development of applications in computational linguistics.

Are Cohesive Features Relevant for Text Readability Evaluation?
Amalia Todirascu | Thomas François | Delphine Bernhard | Núria Gala | Anne-Laure Ligozat
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

This paper investigates the effectiveness of 65 cohesion-based variables that are commonly used in the literature as predictive features to assess text readability. We evaluate the efficiency of these variables across narrative and informative texts intended for an audience of L2 French learners. In our experiments, we use a French corpus that has been both manually and automatically annotated as regards to co-reference and anaphoric chains. The efficiency of the 65 variables for readability is analyzed through a correlational analysis and some modelling experiments.


POS-tagging of Tunisian Dialect Using Standard Arabic Resources and Tools
Ahmed Hamdi | Alexis Nasr | Nizar Habash | Núria Gala
Proceedings of the Second Workshop on Arabic Natural Language Processing


A model to predict lexical complexity and to grade words (Un modèle pour prédire la complexité lexicale et graduer les mots) [in French]
Núria Gala | Thomas François | Delphine Bernhard | Cédrick Fairon
Proceedings of TALN 2014 (Volume 1: Long Papers)

pdf bib
Proceedings of TALN 2014 (Volume 4: RECITAL - Student Research Workshop)
Núria Gala | Klim Peshkov | Brigitte Bigi
Proceedings of TALN 2014 (Volume 4: RECITAL - Student Research Workshop)

Automatically building a Tunisian Lexicon for Deverbal Nouns
Ahmed Hamdi | Núria Gala | Alexis Nasr
Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects

FLELex: a graded lexical resource for French foreign learners
Thomas François | Nùria Gala | Patrick Watrin | Cédrick Fairon
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper we present FLELex, the first graded lexicon for French as a foreign language (FFL) that reports word frequencies by difficulty level (according to the CEFR scale). It has been obtained from a tagged corpus of 777,000 words from available textbooks and simplified readers intended for FFL learners. Our goal is to freely provide this resource to the community to be used for a variety of purposes going from the assessment of the lexical difficulty of a text, to the selection of simpler words within text simplification systems, and also as a dictionary in assistive tools for writing.


Propagation de polarités dans des familles de mots : impact de la morphologie dans la construction d’un lexique pour l’analyse de sentiments (Spreading Polarities among Word Families: Impact of Morphology on Building a Lexicon for Sentiment Analysis) [in French]
Núria Gala | Caroline Brun
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 2: TALN


Création de clusters sémantiques dans des familles morphologiques à partir du TLFi (Creating semantic clusters in morphological families from the TLFi)
Nuria Gala | Nabil Hathout | Alexis Nasr | Véronique Rey | Selja Seppälä
Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

La constitution de ressources linguistiques est une tâche longue et coûteuse. C’est notamment le cas pour les ressources morphologiques. Ces ressources décrivent de façon approfondie et explicite l’organisation morphologique du lexique complétée d’informations sémantiques exploitables dans le domaine du TAL. Le travail que nous présentons dans cet article s’inscrit dans cette perspective et, plus particulièrement, dans l’optique d’affiner une ressource existante en s’appuyant sur des informations sémantiques obtenues automatiquement. Notre objectif est de caractériser sémantiquement des familles morpho-phonologiques (des mots partageant une même racine et une continuité de sens). Pour ce faire, nous avons utilisé des informations extraites du TLFi annoté morpho-syntaxiquement. Les premiers résultats de ce travail seront analysés et discutés.


A Tool for Linking Stems and Conceptual Fragments to Enhance word Access
Nuria Gala | Véronique Rey | Michael Zock
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Electronic dictionaries offer many possibilities unavailable in paper dictionaries to view, display or access information. However, even these resources fall short when it comes to access words sharing semantic features and certain aspects of form: few applications offer the possibility to access a word via a morphologically or semantically related word. In this paper, we present such an application, Polymots, a lexical database for contemporary French containing 20.000 words grouped in 2.000 families. The purpose of this resource is to group words into families on the basis of shared morpho-phonological and semantic information. Words with a common stem form a family; words in a family also share a set of common conceptual fragments (in some families there is a continuity of meaning, in others meaning is distributed). With this approach, we capitalize on the bidirectional link between semantics and morpho-phonology : the user can thus access words not only on the basis of ideas, but also on the basis of formal characteristics of the word, i.e. its morphological features. The resulting lexical database should help people learn French vocabulary and assist them to find words they are looking for, going thus beyond other existing lexical resources.


Dispersion sémantique dans des familles morpho-phonologiques : éléments théoriques et empiriques
Nuria Gala | Véronique Rey | Laurent Tichit
Actes de la 16ème conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Traditionnellement, la morphologie lexicale a été diachronique et a permis de proposer le concept de famille de mots. Ce dernier est repris dans les études en synchronie et repose sur une forte cohérence sémantique entre les mots d’une même famille. Dans cet article, nous proposons une approche en synchronie fondée sur la notion de continuité à la fois phonologique et sémantique. Nous nous intéressons, d’une part, à la morpho-phonologie et, d’autre part, à la dispersion sémantique des mots dans les familles. Une première étude (Gala & Rey, 2008) montrait que les familles de mots obtenues présentaient des espaces sémantiques soit de grande cohésion soit de grande dispersion. Afin de valider ces observations, nous présentons ici une méthode empirique qui permet de pondérer automatiquement les unités de sens d’un mot et d’une famille. Une expérience menée auprès de 30 locuteurs natifs valide notre approche et ouvre la voie pour une étude approfondie du lexique sur ces bases phonologiques et sémantiques.


POLYMOTS : une base de données de constructions dérivationnelles en français à partir de radicaux phonologiques
Nuria Gala | Véronique Rey
Actes de la 15ème conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Cet article présente POLYMOTS, une base de données lexicale contenant huit mille mots communs en français. L’originalité de l’approche proposée tient à l’analyse des mots. En effet, à la différence d’autres bases lexicales représentant la morphologie dérivationnelle des mots à partir d’affixes, ici l’idée a été d’isoler un radical commun à un ensemble de mots d’une même famille. Nous avons donc analysé les formes des mots et, par comparaison phonologique (forme phonique comparable) et morphologique (continuité de sens), nous avons regroupé les mots par familles, selon le type de radical phonologique. L’article présente les fonctionnalités de la base et inclut une discussion sur les applications et les perspectives d’une telle ressource.


Using an incremental robust parser to automatically generate semantic UNL graphs
Nuria Gala
Proceedings of the 3rd workshop on RObust Methods in Analysis of Natural Language Data (ROMAND 2004)