Yannick Parmentier

2024

pdf abs
Generating from AMRs into High and Low-Resource Languages using Phylogenetic Knowledge and Hierarchical QLoRA Training (HQL)
William Soto Martinez | Yannick Parmentier | Claire Gardent
Proceedings of the 17th International Natural Language Generation Conference

Multilingual generation from Abstract Meaning Representations (AMRs) verbalises AMRs into multiple languages. Previous work has focused on high- and medium-resource languages relying on large amounts of training data. In this work, we consider both high- and low-resource languages capping training data size at the lower bound set by our low-resource languages i.e. 31K. We propose a straightforward technique to enhance results on low-resource while preserving performance on high-resource languages. We iteratively refine a multilingua model to a set of monolingual models using Low-Rank Adaptation with a training curriculum based on a tree structure; this permits investigating how the languages used at each iteration impact generation performance on high and low-resource languages. We show an improvement over both mono and multilingual approaches. Comparing different ways of grouping languages at each iteration step we find two working configurations: grouping related languages which promotes transfer, or grouping distant languages which facilitates regularisation

2023

pdf
Phylogeny-Inspired Soft Prompts For Data-to-Text Generation in Low-Resource Languages
William Soto Martinez | Yannick Parmentier | Claire Gardent
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf abs
Towards Sentence-level Text Readability Assessment for French
Duy Van Ngo | Yannick Parmentier
Proceedings of the Second Workshop on Text Simplification, Accessibility and Readability

In this paper, we report on some experiments aimed at exploring the relation between document-level and sentence-level readability assessment for French. These were run on an open-source tailored corpus, which was automatically created by aggregating various sources from children’s literature. On top of providing the research community with a freely available corpus, we report on sentence readability scores obtained when applying both classical approaches (aka readability formulas) and state-of-the-art deep learning techniques (e.g. fine-tuning of large language models). Results show a relatively strong correlation between document-level and sentence-level readability, suggesting ways to reduce the cost of building annotated sentence-level readability datasets.

2021

pdf abs
An Error Analysis Framework for Shallow Surface Realization
Anastasia Shimorina | Yannick Parmentier | Claire Gardent
Transactions of the Association for Computational Linguistics, Volume 9

The metrics standardly used to evaluate Natural Language Generation (NLG) models, such as BLEU or METEOR, fail to provide information on which linguistic factors impact performance. Focusing on Surface Realization (SR), the task of converting an unordered dependency tree into a well-formed sentence, we propose a framework for error analysis which permits identifying which features of the input affect the models’ results. This framework consists of two main components: (i) correlation analyses between a wide range of syntactic metrics and standard performance metrics and (ii) a set of techniques to automatically identify syntactic constructs that often co-occur with low performance scores. We demonstrate the advantages of our framework by performing error analysis on the results of 174 system runs submitted to the Multilingual SR shared tasks; we show that dependency edge accuracy correlate with automatic metrics thereby providing a more interpretable basis for evaluation; and we suggest ways in which our framework could be used to improve models and data. The framework is available in the form of a toolkit which can be used both by campaign organizers to provide detailed, linguistically interpretable feedback on the state of the art in multilingual SR, and by individual researchers to improve models and datasets.1

2018

pdf abs
Interface syntaxe-sémantique au moyen d’une grammaire d’arbres adjoints pour l’étiquetage sémantique de l’arabe (Syntax-semantic interface using Tree-adjoining grammar for Arabic semantic labeling)
Cherifa Ben Khelil | Chiraz Ben Othmane Zribi | Denys Duchier | Yannick Parmentier
Actes de la Conférence TALN. Volume 1 - Articles longs, articles courts de TALN

Dans une grammaire formelle, le lien entre l’information sémantique et sa structure syntaxique correspondante peut être établi en utilisant une interface syntaxe/sémantique qui permettra la construction du sens de la phrase. L’étiquetage de rôles sémantiques aide à réaliser cette tâche en associant automatiquement des rôles sémantiques à chaque argument du prédicat d’une phrase. Dans ce papier, nous présentons une nouvelle approche qui permet la construction d’une telle interface pour une grammaire d’arbres adjoints de l’arabe. Cette grammaire a été générée semi automatiquement à partir d’une méta-grammaire. Nous détaillons le processus d’interfaçage entre le niveau syntaxique et le niveau sémantique moyennant la sémantique des cadres et comment avons-nous procédé à l’étiquetage de rôles sémantiques en utilisant la ressource lexicale ArabicVerbNet.

2017

pdf
Multiword Expression-Aware A* TAG Parsing Revisited
Jakub Waszczuk | Agata Savary | Yannick Parmentier
Proceedings of the 13th International Workshop on Tree Adjoining Grammars and Related Formalisms

pdf bib abs
Annotation d’expressions polylexicales verbales en français (Annotation of verbal multiword expressions in French)
Marie Candito | Mathieu Constant | Carlos Ramisch | Agata Savary | Yannick Parmentier | Caroline Pasquer | Jean-Yves Antoine
Actes des 24ème Conférence sur le Traitement Automatique des Langues Naturelles. Volume 2 - Articles courts

Nous décrivons la partie française des données produites dans le cadre de la campagne multilingue PARSEME sur l’identification d’expressions polylexicales verbales (Savary et al., 2017). Les expressions couvertes pour le français sont les expressions verbales idiomatiques, les verbes intrinsèquement pronominaux et une généralisation des constructions à verbe support. Ces phénomènes ont été annotés sur le corpus French-UD (Nivre et al., 2016) et le corpus Sequoia (Candito & Seddah, 2012), soit un corpus de 22 645 phrases, pour un total de 4 962 expressions annotées. On obtient un ratio d’une expression annotée tous les 100 tokens environ, avec un fort taux d’expressions discontinues (40%).

pdf abs
Un outil pour la manipulation de ressources arborées (A tool for handling tree-based linguistic resources)
Yannick Parmentier
Actes des 24ème Conférence sur le Traitement Automatique des Langues Naturelles. Volume 3 - Démonstrations

Dans cet article, nous présentons brièvement pytreeview, un outil pour la manipulation de ressources arborées (corpus annotés, grammaires électroniques). Initialement conçu pour assiter les utilisateurs linguistes dans leur tâche de développement de grammaires arborescentes, pytreeview a été étendu pour permettre de manipuler des ressources arborées variées (grammaires mais aussi corpus aux formats FTB, PTB, CoNLL, Tiger), afin d’en extraire des informations utiles (par exemple la distribution des cadres de sous-catégorisation). pytreeview est actuellement utilisé dans le cadre d’un projet visant l’extraction semi-automatique de grammaires abstraites (méta-grammaires) à partir de corpus arborés.

2016

pdf bib
ArabTAG: from a Handcrafted to a Semi-automatically Generated TAG
Chérifa Ben Khelil | Denys Duchier | Yannick Parmentier | Chiraz Zribi | Fériel Ben Fraj
Proceedings of the 12th International Workshop on Tree Adjoining Grammars and Related Formalisms (TAG+12)

pdf abs
Promoting multiword expressions in A* TAG parsing
Jakub Waszczuk | Agata Savary | Yannick Parmentier
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Multiword expressions (MWEs) are pervasive in natural languages and often have both idiomatic and compositional readings, which leads to high syntactic ambiguity. We show that for some MWE types idiomatic readings are usually the correct ones. We propose a heuristic for an A* parser for Tree Adjoining Grammars which benefits from this knowledge by promoting MWE-oriented analyses. This strategy leads to a substantial reduction in the parsing search space in case of true positive MWE occurrences, while avoiding parsing failures in case of false positives.

Nous présentons ici différents algorithmes d’analyse pour grammaires à concaténation d’intervalles (Range Concatenation Grammar, RCG), dont un nouvel algorithme de type Earley, dans le paradigme de l’analyse déductive. Notre travail est motivé par l’intérêt porté récemment à ce type de grammaire, et comble un manque dans la littérature existante.

pdf
Convertir des grammaires d’arbres adjoints à composantes multiples avec tuples d’arbres (TT-MCTAG) en grammaires à concaténation d’intervalles (RCG) [Converting tree tuple multicomponent tree adjoining grammars (TT-MCTAGs) into range concatenation grammars (RCGs)]
Laura Kallmeyer | Yannick Parmentier
Traitement Automatique des Langues, Volume 50, Numéro 1 : Varia [Varia]

pdf
An Earley Parsing Algorithm for Range Concatenation Grammars
Laura Kallmeyer | Wolfgang Maier | Yannick Parmentier
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

2008

pdf bib
TuLiPA: Towards a Multi-Formalism Parsing Environment for Grammar Engineering
Laura Kallmeyer | Timm Lichte | Wolfgang Maier | Yannick Parmentier | Johannes Dellert | Kilian Evang
Coling 2008: Proceedings of the workshop on Grammar Engineering Across Frameworks

pdf
TuLiPA: A syntax-semantics parsing environment for mildly context-sensitive formalisms
Yannick Parmentier | Laura Kallmeyer | Wolfgang Maier | Timm Lichte | Johannes Dellert
Proceedings of the Ninth International Workshop on Tree Adjoining Grammar and Related Frameworks (TAG+9)

pdf abs
Convertir des grammaires d’arbres adjoints à composantes multiples avec tuples d’arbres (TT-MCTAG) en grammaires à concaténation d’intervalles (RCG)
Laura Kallmeyer | Yannick Parmentier
Actes de la 15ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Cet article étudie la relation entre les grammaires d’arbres adjoints à composantes multiples avec tuples d’arbres (TT-MCTAG), un formalisme utilisé en linguistique informatique, et les grammaires à concaténation d’intervalles (RCG). Les RCGs sont connues pour décrire exactement la classe PTIME, il a en outre été démontré que les RCGs « simples » sont même équivalentes aux systèmes de réécriture hors-contextes linéaires (LCFRS), en d’autres termes, elles sont légèrement sensibles au contexte. TT-MCTAG a été proposé pour modéliser les langages à ordre des mots libre. En général ces langages sont NP-complets. Dans cet article, nous définissons une contrainte additionnelle sur les dérivations autorisées par le formalisme TT-MCTAG. Nous montrons ensuite comment cette forme restreinte de TT-MCTAG peut être convertie en une RCG simple équivalente. Le résultat est intéressant pour des raisons théoriques (puisqu’il montre que la forme restreinte de TT-MCTAG est légèrement sensible au contexte), mais également pour des raisons pratiques (la transformation proposée ici a été utilisée pour implanter un analyseur pour TT-MCTAG).

pdf abs
Developing a TT-MCTAG for German with an RCG-based Parser
Laura Kallmeyer | Timm Lichte | Wolfgang Maier | Yannick Parmentier | Johannes Dellert
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Developing linguistic resources, in particular grammars, is known to be a complex task in itself, because of (amongst others) redundancy and consistency issues. Furthermore some languages can reveal themselves hard to describe because of specific characteristics, e.g. the free word order in German. In this context, we present (i) a framework allowing to describe tree-based grammars, and (ii) an actual fragment of a core multicomponent tree-adjoining grammar with tree tuples (TT-MCTAG) for German developed using this framework. This framework combines a metagrammar compiler and a parser based on range concatenation grammar (RCG) to respectively check the consistency and the correction of the grammar. The German grammar being developed within this framework already deals with a wide range of scrambling and extraction phenomena.

2007

pdf abs
SemTAG, une architecture pour le développement et l’utilisation de grammaires d’arbres adjoints à portée sémantique
Claire Gardent | Yannick Parmentier
Actes de la 14ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Dans cet article, nous présentons une architecture logicielle libre et ouverte pour le développement de grammaires d’arbres adjoints à portée sémantique. Cette architecture utilise un compilateur de métagrammaires afin de faciliter l’extension et la maintenance de la grammaire, et intègre un module de construction sémantique permettant de vérifier la couverture aussi bien syntaxique que sémantique de la grammaire. Ce module utilise un analyseur syntaxique tabulaire généré automatiquement à partir de la grammaire par le système DyALog. Nous présentons également les résultats de l’évaluation d’une grammaire du français développée au moyen de cette architecture.

pdf
SemTAG: a platform for specifying Tree Adjoining Grammars and performing TAG-based Semantic Construction
Claire Gardent | Yannick Parmentier
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions

2006

pdf
Coreference Handling in XMG
Claire Gardent | Yannick Parmentier
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

pdf
XMG - An Expressive Formalism for Describing Tree-Based Grammars
Yannick Parmentier | Joseph Le Roux | Benoît Crabbé
Demonstrations

pdf bib
A Constraint Driven Metagrammar
Joseph Le Roux | Benoît Crabbé | Yannick Parmentier
Proceedings of the Eighth International Workshop on Tree Adjoining Grammar and Related Formalisms

pdf
SemTAG, the LORIA toolbox for TAG-based Parsing and Generation
Eric Kow | Yannick Parmentier | Claire Gardent
Proceedings of the Eighth International Workshop on Tree Adjoining Grammar and Related Formalisms

2005

pdf bib abs
XMG : un Compilateur de Méta-Grammaires Extensible
Denys Duchier | Joseph Le Roux | Yannick Parmentier
Actes de la 12ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Dans cet article, nous présentons un outil permettant de produire automatiquement des ressources linguistiques, en l’occurence des grammaires. Cet outil se caractérise par son extensibilité, tant du point de vue des formalismes grammaticaux supportés (grammaires d’arbres adjoints et grammaires d’interaction à l’heure actuelle), que de son architecture modulaire, qui facilite l’intégration de nouveaux modules ayant pour but de vérifier la validité des structures produites. En outre, cet outil offre un support adapté au développement de grammaires à portée sémantique.

Venues

cl1

ws1

tal1