This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
NuriaGala
Also published as:
Núria Gala,
Nùria Gala
Fixing paper assignments
Please select all papers that do not belong to this person.
Indicate below which author they should be assigned to.
La Langue française Parlée Complétée (LfPC) est un système de communication développé pour les personnes sourdes afin de compléter la lecture labiale avec une main, au niveau phonétique. Il est utilisé par les enfants pour acquérir des compétences en lecture, en lecture labiale et en communication orale. L’objectif principal est de permettre aux enfants sourds de devenir des lecteurs et des locuteurs compétents en langue française. Nous proposons une preuve de concept (PoC) d’un système de réalité augmentée qui place automatiquement la représentation d’une main codeuse sur la vidéo pré-enregistrée d’un locuteur. Le PoC prédit la forme et la position de la main, le moment durant lequel elle doit être affichée, et ses coordonnées relativement au visage dans la vidéo. Des photos de mains sont ensuite juxtaposées à la vidéo. Des vidéos annotées automatiquement par le PoC ont été montrées à des personnes sourdes qui l’ont accueilli et évalué favorablement.
In this position paper we present a methodology to automatically annotate French text for Cued Speech (CS), a communication system developed for people with hearing loss to complement speech reading at the phonetic level. This visual communication mode uses handshapes in different placements near the face in combination with the mouth movements (called ‘cues’ or ‘keys’) to make the phonemes of spoken language look different from each other. CS is used to acquire skills in lip reading, in oral communication and for reading. Despite many studies demonstrating its benefits, there are few resources available for learning and practicing it, especially in French. We thus propose a methodology to phonemize written corpora so that each word is aligned with the corresponding CS key(s). This methodology is proposed as part of a wider project aimed at creating an augmented reality system displaying a virtual coding hand where the user will be able to choose a text upon its complexity for cueing.
Reducing the complexity of texts by applying an Automatic Text Simplification (ATS) system has been sparking interest inthe area of Natural Language Processing (NLP) for several years and a number of methods and evaluation campaigns haveemerged targeting lexical and syntactic transformations. In recent years, several studies exploit deep learning techniques basedon very large comparable corpora. Yet the lack of large amounts of corpora (original-simplified) for French has been hinderingthe development of an ATS tool for this language. In this paper, we present our system, which is based on a combination ofmethods relying on word embeddings for lexical simplification and rule-based strategies for syntax and discourse adaptations. We present an evaluation of the lexical, syntactic and discourse-level simplifications according to automatic and humanevaluations. We discuss the performances of our system at the lexical, syntactic, and discourse levels
In this paper, we present HIBOU, an eBook application initially developed for iOs, displaying adapted texts (i.e. simplified), and proposing text comprehension activities. The application has been used in six elementary schools in France to evaluate and train reading fluency and comprehension skills on beginning readers of French. HIBOU displays two versions of French literary and documentary texts from the ALECTOR corpus, the ‘original’, and a simplified version. Text simplifications have been manually performed at the lexical, syntactic, and discursive levels. The child can read in autonomy and has access to different games on word identification. HIBOU is at present being developed to be online in a platform that will be available at elementary schools in France.
Annotations of word difficulty by readers provide invaluable insights into lexical complexity. Yet, there is currently a paucity of tools allowing researchers to gather such annotations in an adaptable and simple manner. This article presents PADDLe, an online platform aiming to fill that gap and designed to encourage best practices when collecting difficulty judgements. Studies crafted using the tool ask users to provide a selection of demographic information, then to annotate a certain number of texts and answer multiple-choice comprehension questions after each text. Researchers are encouraged to use a multi-level annotation scheme, to avoid the drawbacks of binary complexity annotations. Once a study is launched, its results are summarised in a visual representation accessible both to researchers and teachers, and can be downloaded in .csv format. Some findings of a pilot study designed with the tool are also provided in the article, to give an idea of the types of research questions it allows to answer.
In this paper, we present a new parallel corpus addressed to researchers, teachers, and speech therapists interested in text simplification as a means of alleviating difficulties in children learning to read. The corpus is composed of excerpts drawn from 79 authentic literary (tales, stories) and scientific (documentary) texts commonly used in French schools for children aged between 7 to 9 years old. The excerpts were manually simplified at the lexical, morpho-syntactic, and discourse levels in order to propose a parallel corpus for reading tests and for the development of automatic text simplification tools. A sample of 21 poor-reading and dyslexic children with an average reading delay of 2.5 years read a portion of the corpus. The transcripts of readings errors were integrated into the corpus with the goal of identifying lexical difficulty in the target population. By means of statistical testing, we provide evidence that the manual simplifications significantly reduced reading errors, highlighting that the words targeted for simplification were not only well-chosen but also substituted with substantially easier alternatives. The entire corpus is available for consultation through a web interface and available on demand for research purposes.
In this paper, we present a corpus of oral narratives from the Nisvai linguistic community and four associated language resources. Nisvai is an oral language spoken by 200 native speakers in the South-East of Malekula, an island of Vanuatu, Oceania. This language had never been the focus of a research before the one leading to this article. The corpus we present is made of 32 annotated narratives segmented into intonation units. The audio records were transcribed using the written conventions specifically developed for the language and translated into French. Four associated language resources have been generated by organizing the annotations into written documents: two of them are available online and two in paper format. The online resources allow the users to listen to the audio recordings whilereading the annotations. They were built to share the results of our fieldwork and to communicate on the Nisvai narrative practices with the researchers as well as with a more general audience. The bilingual paper resources, a booklet of narratives and a Nisvai-French French-Nisvai lexicon, were designed for the Nisvai community by taking into account their future uses (i.e. primary school).
The objective of this work is to introduce text simplification as a potential reading aid to help improve the poor reading performance experienced by visually impaired individuals. As a first step, we explore what makes a text especially complex when read with low vision, by assessing the individual effect of three word properties (frequency, orthographic similarity and length) on reading speed in the presence of Central visual Field Loss (CFL). Individuals with bilateral CFL induced by macular diseases read pairs of French sentences displayed with the self-paced reading method. For each sentence pair, sentence n contained a target word matched with a synonym word of the same length included in sentence n+1. Reading time was recorded for each target word. Given the corpus we used, our results show that (1) word frequency has a significant effect on reading time (the more frequent the faster the reading speed) with larger amplitude (in the range of seconds) compared to normal vision; (2) word neighborhood size has a significant effect on reading time (the more neighbors the slower the reading speed), this effect being rather small in amplitude, but interestingly reversed compared to normal vision; (3) word length has no significant effect on reading time. Supporting the development of new and more effective assistive technology to help low vision is an important and timely issue, with massive potential implications for social and rehabilitation practices. The end goal of this project will be to use our findings to custom text simplification to this specific population and use it as an optimal and efficient reading aid.
Literature in psycholinguistics and neurosciences has showed that abstract and concrete concepts are perceived differently by our brain, and that the abstractness of a word can cause difficulties in reading. In order to integrate this parameter into an automatic text simplification (ATS) system for French readers, an annotated list with 7,898 abstract and concrete nouns has been semi-automatically developed. Our aim was to obtain abstract and concrete nouns from an initial manually annotated short list by using two distributional approaches: nearest neighbors and syntactic co-occurrences. The results of this experience have enabled to shed light on the different behaviors of concrete and abstract nouns in context. Besides, the final list, a resource per se in French available on demand, provides a valuable contribution since annotated resources based on cognitive variables such as concreteness or abstractness are scarce and very difficult to obtain. In future work, the list will be enlarged and integrated into an existing lexicon with ranked synonyms for the identification of complex words in text simplification applications.
In this article, we present ReSyf, a lexical resource of monolingual synonyms ranked according to their difficulty to be read and understood by native learners of French. The synonyms come from an existing lexical network and they have been semantically disambiguated and refined. A ranking algorithm, based on a wide range of linguistic features and validated through an evaluation campaign with human annotators, automatically sorts the synonyms corresponding to a given word sense by reading difficulty. ReSyf is freely available and will be integrated into a web platform for reading assistance. It can also be applied to perform lexical simplification of French texts.
L’intégration de la notion de similarité sémantique entre les unités lexicales est essentielle dans différentes applications de Traitement Automatique des Langues (TAL). De ce fait, elle a reçu un intérêt considérable qui a eu comme conséquence le développement d’une vaste gamme d’approches pour en déterminer une mesure. Ainsi, plusieurs types de mesures de similarité existent, elles utilisent différentes représentations obtenues à partir d’informations soit dans des ressources lexicales, soit dans de gros corpus de données ou bien dans les deux. Dans cet article, nous nous intéressons à la création de signatures sémantiques décrivant des représentations vectorielles de mots à partir du réseau lexical JeuxDeMots (JDM). L’évaluation de ces signatures est réalisée sur deux tâches différentes : mesures de similarité sémantique et substitution lexicale. Les résultats obtenus sont très satisfaisants et surpassent, dans certains cas, les performances des systèmes de l’état de l’art.
La lisibilité d’un texte dépend fortement de la difficulté des unités lexicales qui le composent. La simplification lexicale vise ainsi à remplacer les termes complexes par des équivalents sémantiques plus simples à comprendre : par exemple, BLEU (‘résultat d’un choc’) est plus simple que CONTUSION ou ECCHYMOSE. Il est pour cela nécessaire de disposer de ressources qui listent des synonymes pour des sens donnés et les trient par ordre de difficulté. Cet article décrit une méthode pour constituer une ressource de ce type pour le français. Les listes de synonymes sont extraites de BabelNet et de JeuxDeMots, puis triées grâce à un algorithme statistique d’ordonnancement. Les résultats du tri sont évalués par rapport à 36 listes de synonymes ordonnées manuellement par quarante annotateurs.
This paper investigates the effectiveness of 65 cohesion-based variables that are commonly used in the literature as predictive features to assess text readability. We evaluate the efficiency of these variables across narrative and informative texts intended for an audience of L2 French learners. In our experiments, we use a French corpus that has been both manually and automatically annotated as regards to co-reference and anaphoric chains. The efficiency of the 65 variables for readability is analyzed through a correlational analysis and some modelling experiments.
Lexical complexity plays a central role in readability, particularly for dyslexic children and poor readers because of their slow and laborious decoding and word recognition skills. Although some features to aid readability may be common to most languages (e.g., the majority of ‘easy’ words are of low frequency), we believe that lexical complexity is mainly language-specific. In this paper, we define lexical complexity for French and we present a pilot study on the effects of text simplification in dyslexic children. The participants were asked to read out loud original and manually simplified versions of a standardized French text corpus and to answer comprehension questions after reading each text. The analysis of the results shows that the simplifications performed were beneficial in terms of reading speed and they reduced the number of reading errors (mainly lexical ones) without a loss in comprehension. Although the number of participants in this study was rather small (N=10), the results are promising and contribute to the development of applications in computational linguistics.
In this paper we present FLELex, the first graded lexicon for French as a foreign language (FFL) that reports word frequencies by difficulty level (according to the CEFR scale). It has been obtained from a tagged corpus of 777,000 words from available textbooks and simplified readers intended for FFL learners. Our goal is to freely provide this resource to the community to be used for a variety of purposes going from the assessment of the lexical difficulty of a text, to the selection of simpler words within text simplification systems, and also as a dictionary in assistive tools for writing.
La constitution de ressources linguistiques est une tâche longue et coûteuse. C’est notamment le cas pour les ressources morphologiques. Ces ressources décrivent de façon approfondie et explicite l’organisation morphologique du lexique complétée d’informations sémantiques exploitables dans le domaine du TAL. Le travail que nous présentons dans cet article s’inscrit dans cette perspective et, plus particulièrement, dans l’optique d’affiner une ressource existante en s’appuyant sur des informations sémantiques obtenues automatiquement. Notre objectif est de caractériser sémantiquement des familles morpho-phonologiques (des mots partageant une même racine et une continuité de sens). Pour ce faire, nous avons utilisé des informations extraites du TLFi annoté morpho-syntaxiquement. Les premiers résultats de ce travail seront analysés et discutés.
Electronic dictionaries offer many possibilities unavailable in paper dictionaries to view, display or access information. However, even these resources fall short when it comes to access words sharing semantic features and certain aspects of form: few applications offer the possibility to access a word via a morphologically or semantically related word. In this paper, we present such an application, Polymots, a lexical database for contemporary French containing 20.000 words grouped in 2.000 families. The purpose of this resource is to group words into families on the basis of shared morpho-phonological and semantic information. Words with a common stem form a family; words in a family also share a set of common conceptual fragments (in some families there is a continuity of meaning, in others meaning is distributed). With this approach, we capitalize on the bidirectional link between semantics and morpho-phonology : the user can thus access words not only on the basis of ideas, but also on the basis of formal characteristics of the word, i.e. its morphological features. The resulting lexical database should help people learn French vocabulary and assist them to find words they are looking for, going thus beyond other existing lexical resources.
Traditionnellement, la morphologie lexicale a été diachronique et a permis de proposer le concept de famille de mots. Ce dernier est repris dans les études en synchronie et repose sur une forte cohérence sémantique entre les mots d’une même famille. Dans cet article, nous proposons une approche en synchronie fondée sur la notion de continuité à la fois phonologique et sémantique. Nous nous intéressons, d’une part, à la morpho-phonologie et, d’autre part, à la dispersion sémantique des mots dans les familles. Une première étude (Gala & Rey, 2008) montrait que les familles de mots obtenues présentaient des espaces sémantiques soit de grande cohésion soit de grande dispersion. Afin de valider ces observations, nous présentons ici une méthode empirique qui permet de pondérer automatiquement les unités de sens d’un mot et d’une famille. Une expérience menée auprès de 30 locuteurs natifs valide notre approche et ouvre la voie pour une étude approfondie du lexique sur ces bases phonologiques et sémantiques.
Cet article présente POLYMOTS, une base de données lexicale contenant huit mille mots communs en français. L’originalité de l’approche proposée tient à l’analyse des mots. En effet, à la différence d’autres bases lexicales représentant la morphologie dérivationnelle des mots à partir d’affixes, ici l’idée a été d’isoler un radical commun à un ensemble de mots d’une même famille. Nous avons donc analysé les formes des mots et, par comparaison phonologique (forme phonique comparable) et morphologique (continuité de sens), nous avons regroupé les mots par familles, selon le type de radical phonologique. L’article présente les fonctionnalités de la base et inclut une discussion sur les applications et les perspectives d’une telle ressource.