Alexis Nasr

2025

pdf bib abs
Factual Knowledge Assessment of Language Models Using Distractors
Hichem Ammar Khodja | Abderrahmane Ait gueni ssaid | Frederic Bechet | Quentin Brabant | Alexis Nasr | Gwénolé Lecorvé
Proceedings of the 31st International Conference on Computational Linguistics

Language models encode extensive factual knowledge within their parameters. The accurate assessment of this knowledge is crucial for understanding and improving these models. In the literature, factual knowledge assessment often relies on cloze sentences, which can lead to erroneous conclusions due to the complexity of natural language (out-of-subject continuations, the existence of many correct answers and the several ways of expressing them). In this paper, we introduce a new interpretable knowledge assessment method that mitigates these issues by leveraging distractors—incorrect but plausible alternatives to the correct answer. We propose several strategies for retrieving distractors and determine the most effective one through experimentation. Our method is evaluated against existing approaches, demonstrating solid alignment with human judgment and stronger robustness to verbalization artifacts. The code and data to reproduce our experiments are available on GitHub.

pdf bib abs
Evaluating Pretrained Causal Language Models for Synonymy
Ioana Ivan | Carlos Ramisch | Alexis Nasr
Findings of the Association for Computational Linguistics: ACL 2025

The scaling of causal language models in size and training data enabled them to tackle increasingly complex tasks. Despite the development of sophisticated tests to reveal their new capabilities, the underlying basis of these complex skills remains unclear. We argue that complex skills might be explained using simpler ones, represented by linguistic concepts. As an initial step in exploring this hypothesis, we focus on the lexical-semantic concept of synonymy, laying the groundwork for research into its relationship with more complex skills. We develop a comprehensive test suite to assess various aspects of synonymy under different conditions, and evaluate causal open-source models ranging up to 10 billion parameters. We find that these models effectively recognize synonymy but struggle to generate synonyms when prompted with relevant context.

pdf bib abs
Connaissances factuelles dans les modèles de langue : robustesse et anomalies face à des variations simples du contexte temporel
Hichem Ammar Khodja | Frédéric Béchet | Quentin Brabant | Alexis Nasr | Gwénolé Lecorvé
Actes des 32ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 1 : articles scientifiques originaux

Ce papier explore la robustesse des modèles de langue (ML) face aux variations du contexte temporel dans les connaissances factuelles. Il examine si les ML peuvent associer correctement un contexte temporel à un fait passé valide sur une période de temps délimitée, en leur demandant de différencier les contextes corrects des contextes incorrects. La capacité de distinction des ML est analysée sur deux dimensions : la distance du contexte incorrect par rapport à la période de validité et la granularité du contexte. Pour cela, un jeu de données, TimeStress, est introduit, permettant de tester 18 ML variés. Les résultats révèlent que le meilleur ML n’atteint une distinction parfaite que pour 11% des faits étudiés, avec des erreurs critiques qu’un humain ne ferait pas. Ces travaux soulignent les limites des ML actuels en matière de représentation temporelle.

pdf bib abs
Factual Knowledge in Language Models: Robustness and Anomalies under Simple Temporal Context Variations
Hichem Ammar Khodja | Frederic Bechet | Quentin Brabant | Alexis Nasr | Gwénolé Lecorvé
Proceedings of the First Workshop on Large Language Model Memorization (L2M2)

This paper explores the robustness of language models (LMs) to variations in the temporal context within factual knowledge. It examines whether LMs can correctly associate a temporal context with a past fact valid over a defined period, by asking them to differentiate correct from incorrect contexts. The LMs’ ability to distinguish is analyzed along two dimensions: the distance of the incorrect context from the validity period and the granularity of the context. To this end, a dataset called TimeStress is introduced, enabling the evaluation of 18 diverse LMs. Results reveal that the best LM achieves a perfect distinction for only 11% of the studied facts, with errors, certainly rare, but critical that humans would not make. This work highlights the limitations of current LMs in temporal representation.

2024

pdf bib abs
WikiFactDiff: Un Grand jeu de données Réaliste et Temporellement Adaptable pour la Mise à Jour Atomique des Connaissances Factuelles dans les Modèles de Langue Causaux
Hichem Ammar Khodja | Frédéric Béchet | Quentin Brabant | Alexis Nasr | Gwénolé Lecrové
Actes de la 31ème Conférence sur le Traitement Automatique des Langues Naturelles, volume 1 : articles longs et prises de position

La factualité des modèles de langue se dégrade avec le temps puisque les événements postérieurs à leur entraînement leur sont inconnus. Une façon de maintenir ces modèles à jour pourrait être la mise à jour factuelle à l’échelle de faits atomiques. Pour étudier cette tâche, nous présentons WikiFactDiff, un jeu de données qui représente les changements survenus entre deux dates sous la forme d’un ensemble de faits simples, sous format RDF, divisés en trois catégories : les faits à apprendre, les faits à conserver et les faits obsolètes. Ces faits sont verbalisés afin de permettre l’exécution des algorithmes de mise à jour et leur évaluation, qui est présentée dans ce document. Contrairement aux jeux de données existants, WikiFactDiff représente un cadre de mise à jour réaliste qui implique divers scénarios, notamment les remplacements de faits, leur archivage et l’insertion de nouvelles entités.

pdf bib abs
WikiFactDiff: A Large, Realistic, and Temporally Adaptable Dataset for Atomic Factual Knowledge Update in Causal Language Models
Hichem Ammar Khodja | Frédéric Béchet | Quentin Brabant | Alexis Nasr | Gwénolé Lecorvé
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The factuality of large language model (LLMs) tends to decay over time since events posterior to their training are “unknown” to them. One way to keep models up-to-date could be factual update: the task of inserting, replacing, or removing certain simple (atomic) facts within the model. To study this task, we present WikiFactDiff, a dataset that describes the evolution of factual knowledge between two dates as a collection of simple facts divided into three categories: new, obsolete, and static. We describe several update scenarios arising from various combinations of these three types of basic update. The facts are represented by subject-relation-object triples; indeed, WikiFactDiff was constructed by comparing the state of the Wikidata knowledge base at 4 January 2021 and 27 February 2023. Those fact are accompanied by verbalization templates and cloze tests that enable running update algorithms and their evaluation metrics. Contrary to other datasets, such as zsRE and CounterFact, WikiFactDiff constitutes a realistic update setting that involves various update scenarios, including replacements, archival, and new entity insertions. We also present an evaluation of existing update algorithms on WikiFactDiff.

2023

pdf bib abs
Investigating the Effect of Relative Positional Embeddings on AMR-to-Text Generation with Structural Adapters
Sebastien Montella | Alexis Nasr | Johannes Heinecke | Frederic Bechet | Lina M. Rojas Barahona
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Text generation from Abstract Meaning Representation (AMR) has substantially benefited from the popularized Pretrained Language Models (PLMs). Myriad approaches have linearized the input graph as a sequence of tokens to fit the PLM tokenization requirements. Nevertheless, this transformation jeopardizes the structural integrity of the graph and is therefore detrimental to its resulting representation. To overcome this issue, Ribeiro et al. (2021b) have recently proposed StructAdapt, a structure-aware adapter which injects the input graph connectivity within PLMs using Graph Neural Networks (GNNs). In this paper, we investigate the influence of Relative Position Embeddings (RPE) on AMR-to-Text, and, in parallel, we examine the robustness of StructAdapt. Through ablation studies, graph attack and link prediction, we reveal that RPE might be partially encoding input graphs. We suggest further research regarding the role of RPE will provide valuable insights for Graph-to-Text generation.

2022

pdf bib abs
Transfer Learning and Masked Generation for Answer Verbalization
Sebastien Montella | Lina Rojas-Barahona | Frederic Bechet | Johannes Heinecke | Alexis Nasr
Proceedings of the Workshop on Structured and Unstructured Knowledge Integration (SUKI)

Structured Knowledge has recently emerged as an essential component to support fine-grained Question Answering (QA). In general, QA systems query a Knowledge Base (KB) to detect and extract the raw answers as final prediction. However, as lacking of context, language generation can offer a much informative and complete response. In this paper, we propose to combine the power of transfer learning and the advantage of entity placeholders to produce high-quality verbalization of extracted answers from a KB. We claim that such approach is especially well-suited for answer generation. Our experiments show 44.25%, 3.26% and 29.10% relative gain in BLEU over the state-of-the-art on the VQuAnDA, ParaQA and VANiLLa datasets, respectively. We additionally provide minor hallucinations corrections in VANiLLa standing for 5% of each of the training and testing set. We witness a median absolute gain of 0.81 SacreBLEU. This strengthens the importance of data quality when using automated evaluation.

pdf bib abs
Dependency Parsing with Backtracking using Deep Reinforcement Learning
Franck Dary | Maxime Petit | Alexis Nasr
Transactions of the Association for Computational Linguistics, Volume 10

Greedy algorithms for NLP such as transition-based parsing are prone to error propagation. One way to overcome this problem is to allow the algorithm to backtrack and explore an alternative solution in cases where new evidence contradicts the solution explored so far. In order to implement such a behavior, we use reinforcement learning and let the algorithm backtrack in cases where such an action gets a better reward than continuing to explore the current solution. We test this idea on both POS tagging and dependency parsing and show that backtracking is an effective means to fight against error propagation.

2021

pdf bib abs
TALEP at CMCL 2021 Shared Task: Non Linear Combination of Low and High-Level Features for Predicting Eye-Tracking Data
Franck Dary | Alexis Nasr | Abdellah Fourtassi
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics

In this paper we describe our contribution to the CMCL 2021 Shared Task, which consists in predicting 5 different eye tracking variables from English tokenized text. Our approach is based on a neural network that combines both raw textual features we extracted from the text and parser-based features that include linguistic predictions (e.g. part of speech) and complexity metrics (e.g., entropy of parsing). We found that both the features we considered as well as the architecture of the neural model that combined these features played a role in the overall performance. Our system achieved relatively high accuracy on the test data of the challenge and was ranked 2nd out of 13 competing teams and a total of 30 submissions.

pdf bib abs
The Reading Machine: A Versatile Framework for Studying Incremental Parsing Strategies
Franck Dary | Alexis Nasr
Proceedings of the 17th International Conference on Parsing Technologies and the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies (IWPT 2021)

The Reading Machine, is a parsing framework that takes as input raw text and performs six standard nlp tasks: tokenization, pos tagging, morphological analysis, lemmatization, dependency parsing and sentence segmentation. It is built upon Transition Based Parsing, and allows to implement a large number of parsing configurations, among which a fully incremental one. Three case studies are presented to highlight the versatility of the framework. The first one explores whether an incremental parser is able to take into account top-down dependencies (i.e. the influence of high level decisions on low level ones), the second compares the performances of an incremental and a pipe-line architecture and the third quantifies the impact of the right context on the predictions made by an incremental parser.

2020

pdf bib abs
SLICE: Supersense-based Lightweight Interpretable Contextual Embeddings
Cindy Aloui | Carlos Ramisch | Alexis Nasr | Lucie Barque
Proceedings of the 28th International Conference on Computational Linguistics

Contextualised embeddings such as BERT have become de facto state-of-the-art references in many NLP applications, thanks to their impressive performances. However, their opaqueness makes it hard to interpret their behaviour. SLICE is a hybrid model that combines supersense labels with contextual embeddings. We introduce a weakly supervised method to learn interpretable embeddings from raw corpora and small lists of seed words. Our model is able to represent both a word and its context as embeddings into the same compact space, whose dimensions correspond to interpretable supersenses. We assess the model in a task of supersense tagging for French nouns. The little amount of supervision required makes it particularly well suited for low-resourced scenarios. Thanks to its interpretability, we perform linguistic analyses about the predicted supersenses in terms of input word and context representations.

2019

pdf bib abs
CALOR-QUEST : un corpus d’entraînement et d’évaluation pour la compréhension automatique de textes (Machine reading comprehension is a task related to Question-Answering where questions are not generic in scope but are related to a particular document)
Frederic Bechet | Cindy Aloui | Delphine Charlet | Geraldine Damnati | Johannes Heinecke | Alexis Nasr | Frederic Herledan
Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Volume II : Articles courts

La compréhension automatique de texte est une tâche faisant partie de la famille des systèmes de Question/Réponse où les questions ne sont pas à portée générale mais sont liées à un document particulier. Récemment de très grand corpus (SQuAD, MS MARCO) contenant des triplets (document, question, réponse) ont été mis à la disposition de la communauté scientifique afin de développer des méthodes supervisées à base de réseaux de neurones profonds en obtenant des résultats prometteurs. Ces méthodes sont cependant très gourmandes en données d’apprentissage, données qui n’existent pour le moment que pour la langue anglaise. Le but de cette étude est de permettre le développement de telles ressources pour d’autres langues à moindre coût en proposant une méthode générant de manière semi-automatique des questions à partir d’une analyse sémantique d’un grand corpus. La collecte de questions naturelle est réduite à un ensemble de validation/test. L’application de cette méthode sur le corpus CALOR-Frame a permis de développer la ressource CALOR-QUEST présentée dans cet article.

pdf bib abs
CALOR-QUEST : generating a training corpus for Machine Reading Comprehension models from shallow semantic annotations
Frederic Bechet | Cindy Aloui | Delphine Charlet | Geraldine Damnati | Johannes Heinecke | Alexis Nasr | Frederic Herledan
Proceedings of the 2nd Workshop on Machine Reading for Question Answering

Machine reading comprehension is a task related to Question-Answering where questions are not generic in scope but are related to a particular document. Recently very large corpora (SQuAD, MS MARCO) containing triplets (document, question, answer) were made available to the scientific community to develop supervised methods based on deep neural networks with promising results. These methods need very large training corpus to be efficient, however such kind of data only exists for English and Chinese at the moment. The aim of this study is the development of such resources for other languages by proposing to generate in a semi-automatic way questions from the semantic Frame analysis of large corpora. The collect of natural questions is reduced to a validation/test set. We applied this method on the CALOR-Frame French corpus to develop the CALOR-QUEST resource presented in this paper.

pdf bib abs
Typological Features for Multilingual Delexicalised Dependency Parsing
Manon Scholivet | Franck Dary | Alexis Nasr | Benoit Favre | Carlos Ramisch
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

The existence of universal models to describe the syntax of languages has been debated for decades. The availability of resources such as the Universal Dependencies treebanks and the World Atlas of Language Structures make it possible to study the plausibility of universal grammar from the perspective of dependency parsing. Our work investigates the use of high-level language descriptions in the form of typological features for multilingual dependency parsing. Our experiments on multilingual parsing for 40 languages show that typological information can indeed guide parsers to share information between similar languages beyond simple language identification.

2018

pdf bib abs
Correction automatique d’attachements prépositionnels par utilisation de traits visuels (PP-attachement resolution using visual features)
Sébastien Delecraz | Leonor Becerra-Bonache | Benoît Favre | Alexis Nasr | Frédéric Bechet
Actes de la Conférence TALN. Volume 1 - Articles longs, articles courts de TALN

La désambiguïsation des rattachements prépositionnels est une tâche syntaxique qui demande des connaissances sémantiques, pouvant être extraites d’une image associée au texte traité. Nous présentons et analysons les difficultés de cette tâche pour laquelle nous construisons un système complet entraîné sur une version étendue des annotations du corpus Flickr30k Entities. Lorsque la sémantique lexicale n’est pas disponible, l’information visuelle apporte 3 % d’amélioration.

pdf bib abs
Annotation en Actes de Dialogue pour les Conversations d’Assistance en Ligne (Dialog Acts Annotations for Online Chats)
Robin Perrotin | Alexis Nasr | Jeremy Auguste
Actes de la Conférence TALN. Volume 1 - Articles longs, articles courts de TALN

Les conversations techniques en ligne sont un type de productions linguistiques qui par de nombreux aspects se démarquent des objets plus usuellement étudiés en traitement automatique des langues : il s’agit de dialogues écrits entre deux locuteurs qui servent de support à la résolution coopérative des problèmes des usagers. Nous proposons de décrire ici ces conversations par un étiquetage en actes de dialogue spécifiquement conçu pour les conversations en ligne. Différents systèmes de prédictions ont été évalués ainsi qu’une méthode permettant de s’abstraire des spécificités lexicales du corpus d’apprentissage.

pdf bib
Handling Normalization Issues for Part-of-Speech Tagging of Online Conversational Text
Géraldine Damnati | Jeremy Auguste | Alexis Nasr | Delphine Charlet | Johannes Heinecke | Frédéric Béchet
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Semantic Frame Parsing for Information Extraction : the CALOR corpus
Gabriel Marzinotto | Jeremy Auguste | Frederic Bechet | Geraldine Damnati | Alexis Nasr
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Adding Syntactic Annotations to Flickr30k Entities Corpus for Multimodal Ambiguous Prepositional-Phrase Attachment Resolution
Sebastien Delecraz | Alexis Nasr | Frederic Bechet | Benoit Favre
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib abs
TAG Parsing with Neural Networks and Vector Representations of Supertags
Jungo Kasai | Robert Frank | R. Thomas McCoy | Owen Rambow | Alexis Nasr
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We present supertagging-based models for Tree Adjoining Grammar parsing that use neural network architectures and dense vector representation of supertags (elementary trees) to achieve state-of-the-art performance in unlabeled and labeled attachment scores. The shift-reduce parsing model eschews lexical information entirely, and uses only the 1-best supertags to parse a sentence, providing further support for the claim that supertagging is “almost parsing.” We demonstrate that the embedding vector representations the parser induces for supertags possess linguistically interpretable structure, supporting analogies between grammatical structures like those familiar from recent work in distributional semantics. This dense representation of supertags overcomes the drawbacks for statistical models of TAG as compared to CCG parsing, raising the possibility that TAG is a viable alternative for NLP tasks that require the assignment of richer structural descriptions to sentences.

pdf bib abs
Correcting prepositional phrase attachments using multimodal corpora
Sebastien Delecraz | Alexis Nasr | Frederic Bechet | Benoit Favre
Proceedings of the 15th International Conference on Parsing Technologies

PP-attachments are an important source of errors in parsing natural language. We propose in this article to use data coming from a multimodal corpus, combining textual, visual and conceptual information, as well as a correction strategy, to propose alternative attachments in the output of a parser.

2016

pdf bib abs
Deeper syntax for better semantic parsing
Olivier Michalon | Corentin Ribeyre | Marie Candito | Alexis Nasr
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Syntax plays an important role in the task of predicting the semantic structure of a sentence. But syntactic phenomena such as alternations, control and raising tend to obfuscate the relation between syntax and semantics. In this paper we predict the semantic structure of a sentence using a deeper syntax than what is usually done. This deep syntactic representation abstracts away from purely syntactic phenomena and proposes a structural organization of the sentence that is closer to the semantic representation. Experiments conducted on a French corpus annotated with semantic frames showed that a semantic parser reaches better performances with such a deep syntactic input.

pdf bib
Integrating Selectional Constraints and Subcategorization Frames in a Dependency Parser
Seyed Abolghasem Mirroshandel | Alexis Nasr
Computational Linguistics, Volume 42, Issue 1 - March 2016

pdf bib abs
DeQue: A Lexicon of Complex Prepositions and Conjunctions in French
Carlos Ramisch | Alexis Nasr | André Valli | José Deulofeu
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We introduce DeQue, a lexicon covering French complex prepositions (CPRE) like “à partir de” (from) and complex conjunctions (CCONJ) like “bien que” (although). The lexicon includes fine-grained linguistic description based on empirical evidence. We describe the general characteristics of CPRE and CCONJ in French, with special focus on syntactic ambiguity. Then, we list the selection criteria used to build the lexicon and the corpus-based methodology employed to collect entries. Finally, we quantify the ambiguity of each construction by annotating around 100 sentences randomly taken from the FRWaC. In addition to its theoretical value, the resource has many potential practical applications. We intend to employ DeQue for treebank annotation and to train a dependency parser that can takes complex constructions into account.

pdf bib
Revisiting Supertagging and Parsing: How to Use Supertags in Transition-Based Parsing
Wonchang Chung | Suhas Siddhesh Mhatre | Alexis Nasr | Owen Rambow | Srinivas Bangalore
Proceedings of the 12th International Workshop on Tree Adjoining Grammars and Related Formalisms (TAG+12)

pdf bib
Syntactic parsing of chat language in contact center conversation corpus
Alexis Nasr | Geraldine Damnati | Aleksandra Guerraz | Frederic Bechet
Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue

2015

pdf bib
Joint Dependency Parsing and Multiword Expression Tokenization
Alexis Nasr | Carlos Ramisch | José Deulofeu | André Valli
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
Rapid FrameNet annotation of spoken conversation transcripts
Jeremy Trione | Frederic Bechet | Benoit Favre | Alexis Nasr
Proceedings of the 11th Joint ACL-ISO Workshop on Interoperable Semantic Annotation (ISA-11)

pdf bib
POS-tagging of Tunisian Dialect Using Standard Arabic Resources and Tools
Ahmed Hamdi | Alexis Nasr | Nizar Habash | Núria Gala
Proceedings of the Second Workshop on Arabic Natural Language Processing

2014

pdf bib abs
Automatically enriching spoken corpora with syntactic information for linguistic studies
Alexis Nasr | Frederic Bechet | Benoit Favre | Thierry Bazillon | Jose Deulofeu | Andre Valli
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Syntactic parsing of speech transcriptions faces the problem of the presence of disfluencies that break the syntactic structure of the utterances. We propose in this paper two solutions to this problem. The first one relies on a disfluencies predictor that detects disfluencies and removes them prior to parsing. The second one integrates the disfluencies in the syntactic structure of the utterances and train a disfluencies aware parser.

pdf bib
Automatically building a Tunisian Lexicon for Deverbal Nouns
Ahmed Hamdi | Núria Gala | Alexis Nasr
Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects

2013

pdf bib
The Effects of Factorizing Root and Pattern Mapping in Bidirectional Tunisian - Standard Arabic Machine Translation
Ahmed Hamdi | Rahma Boujelbane | Nizar Habash | Alexis Nasr
Proceedings of Machine Translation Summit XIV: Papers

pdf bib
Translating verbs between MSA and arabic dialects through deep morphological analysis (Un système de traduction de verbes entre arabe standard et arabe dialectal par analyse morphologique profonde) [in French]
Ahmed Hamdi | Rahma Boujelbane | Nizar Habash | Alexis Nasr
Proceedings of TALN 2013 (Volume 1: Long Papers)

pdf bib
Enforcing Subcategorization Constraints in a Parser Using Sub-parses Recombining
Seyed Abolghasem Mirroshandel | Alexis Nasr | Benoît Sagot
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2012

pdf bib abs
Syntactic annotation of spontaneous speech: application to call-center conversation data
Thierry Bazillon | Melanie Deplano | Frederic Bechet | Alexis Nasr | Benoit Favre
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper describes the syntactic annotation process of the DECODA corpus. This corpus contains manual transcriptions of spoken conversations recorded in the French call-center of the Paris Public Transport Authority (RATP). Three levels of syntactic annotation have been performed with a semi-supervised approach: POS tags, Syntactic Chunks and Dependency parses. The main idea is to use off-the-shelf NLP tools and models, originaly developped and trained on written text, to perform a first automatic annotation on the manually transcribed corpus. At the same time a fully manual annotation process is performed on a subset of the original corpus, called the GOLD corpus. An iterative process is then applied, consisting in manually correcting errors found in the automatic annotations, retraining the linguistic models of the NLP tools on this corrected corpus, then checking the quality of the adapted models on the fully manual annotations of the GOLD corpus. This process iterates until a certain error rate is reached. This paper describes this process, the main issues raising when adapting NLP tools to process speech transcriptions, and presents the first evaluations performed with these new adapted tools.

pdf bib
Semi-supervised Dependency Parsing using Lexical Affinities
Seyed Abolghasem Mirroshandel | Alexis Nasr | Joseph Le Roux
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Extracting a Semantic Lexicon of French Adjectives from a Large Lexicographic Dictionary
Selja Seppälä | Lucie Barque | Alexis Nasr
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf bib
Generative Constituent Parsing and Discriminative Dependency Reranking: Experiments on English and French
Joseph Le Roux | Benoît Favre | Alexis Nasr | Seyed Abolghasem Mirroshandel
Proceedings of the ACL 2012 Joint Workshop on Statistical Parsing and Semantic Processing of Morphologically Rich Languages

pdf bib
Dictionary-ontology cross-enrichment
Emmanuel Eckard | Lucie Barque | Alexis Nasr | Benoît Sagot
Proceedings of the 3rd Workshop on Cognitive Aspects of the Lexicon

2011

pdf bib abs
Qui êtes-vous ? Catégoriser les questions pour déterminer le rôle des locuteurs dans des conversations orales (Who are you? Categorize questions to determine the role of speakers in oral conversations)
Thierry Bazillon | Benjamin Maza | Mickael Rouvier | Frédéric Béchet | Alexis Nasr
Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

La fouille de données orales est un domaine de recherche visant à caractériser un flux audio contenant de la parole d’un ou plusieurs locuteurs, à l’aide de descripteurs liés à la forme et au contenu du signal. Outre la transcription automatique en mots des paroles prononcées, des informations sur le type de flux audio traité ainsi que sur le rôle et l’identité des locuteurs sont également cruciales pour permettre des requêtes complexes telles que : « chercher des débats sur le thème X », « trouver toutes les interviews de Y », etc. Dans ce cadre, et en traitant des conversations enregistrées lors d’émissions de radio ou de télévision, nous étudions la manière dont les locuteurs expriment des questions dans les conversations, en partant de l’intuition initiale que la forme des questions posées est une signature du rôle du locuteur dans la conversation (présentateur, invité, auditeur, etc.). En proposant une classification du type des questions et en utilisant ces informations en complément des descripteurs généralement utilisés dans la littérature pour classer les locuteurs par rôle, nous espérons améliorer l’étape de classification, et valider par la même occasion notre intuition initiale.

pdf bib abs
Modèles génératif et discriminant en analyse syntaxique : expériences sur le corpus arboré de Paris 7 (Generative and discriminative models in parsing: experiments on the Paris 7 Treebank)
Joseph Le Roux | Benoît Favre | Seyed Abolghasem Mirroshandel | Alexis Nasr
Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Nous présentons une architecture pour l’analyse syntaxique en deux étapes. Dans un premier temps un analyseur syntagmatique construit, pour chaque phrase, une liste d’analyses qui sont converties en arbres de dépendances. Ces arbres sont ensuite réévalués par un réordonnanceur discriminant. Cette méthode permet de prendre en compte des informations auxquelles l’analyseur n’a pas accès, en particulier des annotations fonctionnelles. Nous validons notre approche par une évaluation sur le corpus arboré de Paris 7. La seconde étape permet d’améliorer significativement la qualité des analyses retournées, quelle que soit la métrique utilisée.

pdf bib abs
Création de clusters sémantiques dans des familles morphologiques à partir du TLFi (Creating semantic clusters in morphological families from the TLFi)
Nuria Gala | Nabil Hathout | Alexis Nasr | Véronique Rey | Selja Seppälä
Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

La constitution de ressources linguistiques est une tâche longue et coûteuse. C’est notamment le cas pour les ressources morphologiques. Ces ressources décrivent de façon approfondie et explicite l’organisation morphologique du lexique complétée d’informations sémantiques exploitables dans le domaine du TAL. Le travail que nous présentons dans cet article s’inscrit dans cette perspective et, plus particulièrement, dans l’optique d’affiner une ressource existante en s’appuyant sur des informations sémantiques obtenues automatiquement. Notre objectif est de caractériser sémantiquement des familles morpho-phonologiques (des mots partageant une même racine et une continuité de sens). Pour ce faire, nous avons utilisé des informations extraites du TLFi annoté morpho-syntaxiquement. Les premiers résultats de ce travail seront analysés et discutés.

pdf bib
Active Learning Strategies for Support Vector Machines, Application to Temporal Relation Classification
Seyed Abolghasem Mirroshandel | Gholamreza Ghassem-Sani | Alexis Nasr
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
MACAON An NLP Tool Suite for Processing Word Lattices
Alexis Nasr | Frédéric Béchet | Jean-François Rey | Benoît Favre | Joseph Le Roux
Proceedings of the ACL-HLT 2011 System Demonstrations

pdf bib
Active Learning for Dependency Parsing Using Partially Annotated Sentences
Seyed Abolghasem Mirroshandel | Alexis Nasr
Proceedings of the 12th International Conference on Parsing Technologies

2010

pdf bib
MACAON Une chaîne linguistique pour le traitement de graphes de mots
Alexis Nasr | Frédéric Béchet | Jean-François Rey
Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Démonstrations

2009

pdf bib abs
Analyse syntaxique en dépendances de l’oral spontané
Alexis Nasr | Frédéric Béchet
Actes de la 16ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Cet article décrit un modèle d’analyse syntaxique de l’oral spontané axé sur la reconnaissance de cadres valenciels verbaux. Le modèle d’analyse se décompose en deux étapes : une étape générique, basée sur des ressources génériques du français et une étape de réordonnancement des solutions de l’analyseur réalisé par un modèle spécifique à une application. Le modèle est évalué sur le corpus MEDIA.

pdf bib
Un modèle formel de descriptions lexicales: Formalisme BDéf et structures de traits typées [A formal model for lexical descriptions: Typed feature-structure in the formalism BDéf]
Lucie Barque | Alexis Nasr
Traitement Automatique des Langues, Volume 50, Numéro 1 : Varia [Varia]

pdf bib
MICA: A Probabilistic Dependency Parser Based on Tree Insertion Grammars (Application Note)
Srinivas Bangalore | Pierre Boullier | Alexis Nasr | Owen Rambow | Benoît Sagot
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers

pdf bib
Constructing parse forests that include exactly the n-best PCFG trees
Pierre Boullier | Alexis Nasr | Benoît Sagot
Proceedings of the 11th International Conference on Parsing Technologies (IWPT’09)

2004

pdf bib abs
Couplage d’un étiqueteur morpho-syntaxique et d’un analyseur partiel représentés sous la forme d’automates finis pondérés
Alexis Nasr | Alexandra Volanschi
Actes de la 11ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Cet article présente une manière d’intégrer un étiqueteur morpho-syntaxique et un analyseur partiel. Cette integration permet de corriger des erreurs effectuées par l’étiqueteur seul. L’étiqueteur et l’analyseur ont été réalisés sous la forme d’automates pondérés. Des résultats sur un corpus du français ont montré une dimintion du taux d’erreur de l’ordre de 12%.

pdf bib
Tagging with Hidden Markov Models Using Ambiguous Tags
Alexis Nasr | Frédéric Bechét | Alexandra Volanschi
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf bib
A Simple String-Rewriting Formalism for Dependency Grammar
Alexis Nasr | Owen Rambow
Proceedings of the Workshop on Recent Advances in Dependency Grammar

pdf bib
SuperTagging and Full Parsing
Alexis Nasr | Owen Rambow
Proceedings of the 7th International Workshop on Tree Adjoining Grammar and Related Formalisms

This paper reports on an experiment in assembling a domain-specific machine translation prototype system from off-the-shelf components. The design goals of this experiment were to reuse existing components, to use machine-learning techniques for parser specialization and for transfer lexicon extraction, and to use an expressive, lexicalized formalism for the transfer component.

pdf bib
Pseudo-Projectivity: A Polynomially Parsable Non-Projective Dependency Grammar
Sylvain Kahane | Alexis Nasr | Owen Rambow
COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics

pdf bib
Pseudo-Projectivity, A Polynomially Parsable Non-Projective Dependency Grammar
Sylvain Kahane | Alexis Nasr | Owen Rambow
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1