2024
pdf
abs
Évaluation de l’apport des chaînes de coréférences pour le liage d’entités
Léo Labat
|
Lauriane Aufrant
Actes de la 31ème Conférence sur le Traitement Automatique des Langues Naturelles, volume 1 : articles longs et prises de position
Ce travail propose de revisiter les approches de liage d’entités au regard de la tâche très prochequ’est la résolution de coréférence. Nous observons en effet différentes configurations (appuyéespar l’exemple) où le reste de la chaîne de coréférence peut fournir des indices utiles pour améliorerla désambiguïsation. Guidés par ces motivations théoriques, nous menons une analyse d’erreursaccompagnée d’expériences oracles qui confirment le potentiel de stratégies de combinaison deprédictions au sein de la chaîne de coréférence (jusqu’à 4.3 F1 sur les mentions coréférentes en anglais). Nousesquissons alors une première preuve de concept de combinaison par vote, en explorant différentesheuristiques de pondération, qui apporte des gains modestes mais interprétables.
pdf
abs
UkraiNER: A New Corpus and Annotation Scheme towards Comprehensive Entity Recognition
Lauriane Aufrant
|
Lucie Chasseur
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Named entity recognition as it is traditionally envisioned excludes in practice a significant part of the entities of potential interest for real-word applications: nested, discontinuous, non-named entities. Despite various attempts to broaden their coverage, subsequent annotation schemes have achieved little adoption in the literature and the most restrictive variant of NER remains the default. This is partly due to the complexity of those annotations and their format. In this paper, we introduce a new annotation scheme that offers higher comprehensiveness while preserving simplicity, together with an annotation tool to implement that scheme. We also release the corpus UkraiNER, comprised of 10,000 French sentences in the geopolitical news domain and manually annotated with comprehensive entity recognition. Our baseline experiments on UkraiNER provide a first point of comparison to facilitate future research (82 F1 for comprehensive entity recognition, 87 F1 when focusing on traditional nested NER), as well as various insights on the composition and challenges that this corpus presents for state-of-the-art named entity recognition models.
2023
pdf
abs
CEN-CENELEC JTC 21 : La standardisation en TALN au service du règlement européen sur l’IA
Lauriane Aufrant
Actes de CORIA-TALN 2023. Actes de la 30e Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 6 : projets
Cette contribution présente les travaux du comité européen de standardisation de l’IA en matière de TALN. Le comité CEN-CENELEC JTC 21 a été mandaté par la Commission européenne pour développer les standards techniques permettant la mise en application du futur règlement européen sur l’IA : performance, robustesse, transparence, etc. Dans ce contexte, le TALN a été identifié comme un volet spécifique de l’IA, méritant ses propres outils, critères et bonnes pratiques. Ce constat a mené au développement d’une feuille de route ambitieuse incluant plusieurs projets de standardisation en TALN.À ce jour, un premier travail d’inventaire et de définition des tâches de TALN a déjà été initié, et la rédaction d’un standard sur les métriques d’évaluation débute. Ces travaux ont aussi été l’occasion d’une réflexion plus large sur les besoins en standardisation du TALN, incluant une taxonomie des méthodes et des travaux sur les formats d’annotation et l’interopérabilité.
2022
pdf
abs
Is NLP Ready for Standardization?
Lauriane Aufrant
Findings of the Association for Computational Linguistics: EMNLP 2022
While standardization is a well-established activity in other scientific fields such as telecommunications, networks or multimedia, in the field of AI and more specifically NLP it is still at its dawn. In this paper, we explore how various aspects of NLP (evaluation, data, tasks...) lack standards and how that can impact science, but also the society, the industry, and regulations. We argue that the numerous initiatives to rationalize the field and establish good practices are only the first step, and developing formal standards remains needed to bring further clarity to NLP research and industry, at a time where this community faces various crises regarding ethics or reproducibility. We thus encourage NLP researchers to contribute to existing and upcoming standardization projects, so that they can express their needs and concerns, while sharing their expertise.
2018
pdf
abs
Exploiting Dynamic Oracles to Train Projective Dependency Parsers on Non-Projective Trees
Lauriane Aufrant
|
Guillaume Wisniewski
|
François Yvon
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)
Because the most common transition systems are projective, training a transition-based dependency parser often implies to either ignore or rewrite the non-projective training examples, which has an adverse impact on accuracy. In this work, we propose a simple modification of dynamic oracles, which enables the use of non-projective data when training projective parsers. Evaluation on 73 treebanks shows that our method achieves significant gains (+2 to +7 UAS for the most non-projective languages) and consistently outperforms traditional projectivization and pseudo-projectivization approaches.
pdf
abs
Quantifying training challenges of dependency parsers
Lauriane Aufrant
|
Guillaume Wisniewski
|
François Yvon
Proceedings of the 27th International Conference on Computational Linguistics
Not all dependencies are equal when training a dependency parser: some are straightforward enough to be learned with only a sample of data, others embed more complexity. This work introduces a series of metrics to quantify those differences, and thereby to expose the shortcomings of various parsing algorithms and strategies. Apart from a more thorough comparison of parsing systems, these new tools also prove useful for characterizing the information conveyed by cross-lingual parsers, in a quantitative but still interpretable way.
2017
pdf
abs
LIMSI@CoNLL’17: UD Shared Task
Lauriane Aufrant
|
Guillaume Wisniewski
|
François Yvon
Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies
This paper describes LIMSI’s submission to the CoNLL 2017 UD Shared Task, which is focused on small treebanks, and how to improve low-resourced parsing only by ad hoc combination of multiple views and resources. We present our approach for low-resourced parsing, together with a detailed analysis of the results for each test treebank. We also report extensive analysis experiments on model selection for the PUD treebanks, and on annotation consistency among UD treebanks.
pdf
abs
Don’t Stop Me Now! Using Global Dynamic Oracles to Correct Training Biases of Transition-Based Dependency Parsers
Lauriane Aufrant
|
Guillaume Wisniewski
|
François Yvon
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers
This paper formalizes a sound extension of dynamic oracles to global training, in the frame of transition-based dependency parsers. By dispensing with the pre-computation of references, this extension widens the training strategies that can be entertained for such parsers; we show this by revisiting two standard training procedures, early-update and max-violation, to correct some of their search space sampling biases. Experimentally, on the SPMRL treebanks, this improvement increases the similarity between the train and test distributions and yields performance improvements up to 0.7 UAS, without any computation overhead.
2016
pdf
Frustratingly Easy Cross-Lingual Transfer for Transition-Based Dependency Parsing
Ophélie Lacroix
|
Lauriane Aufrant
|
Guillaume Wisniewski
|
François Yvon
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
pdf
bib
abs
Apprentissage d’analyseur en dépendances cross-lingue par projection partielle de dépendances (Cross-lingual learning of dependency parsers from partially projected dependencies )
Ophélie Lacroix
|
Lauriane Aufrant
|
Guillaume Wisniewski
|
François Yvon
Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 2 : TALN (Articles longs)
Cet article présente une méthode simple de transfert cross-lingue de dépendances. Nous montrons tout d’abord qu’il est possible d’apprendre un analyseur en dépendances par transition à partir de données partiellement annotées. Nous proposons ensuite de construire de grands ensembles de données partiellement annotés pour plusieurs langues cibles en projetant les dépendances via les liens d’alignement les plus sûrs. En apprenant des analyseurs pour les langues cibles à partir de ces données partielles, nous montrons que cette méthode simple obtient des performances qui rivalisent avec celles de méthodes état-de-l’art récentes, tout en ayant un coût algorithmique moindre.
pdf
abs
Ne nous arrêtons pas en si bon chemin : améliorations de l’apprentissage global d’analyseurs en dépendances par transition (Don’t Stop Me Now ! Improved Update Strategies for Global Training of Transition-Based)
Lauriane Aufrant
|
Guillaume Wisniewski
|
François Yvon
Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 2 : TALN (Articles longs)
Dans cet article, nous proposons trois améliorations simples pour l’apprentissage global d’analyseurs en dépendances par transition de type A RC E AGER : un oracle non déterministe, la reprise sur le même exemple après une mise à jour et l’entraînement en configurations sous-optimales. Leur combinaison apporte un gain moyen de 0,2 UAS sur le corpus SPMRL. Nous introduisons également un cadre général permettant la comparaison systématique de ces stratégies et de la plupart des variantes connues. Nous montrons que la littérature n’a étudié que quelques stratégies parmi les nombreuses variations possibles, négligeant ainsi plusieurs pistes d’améliorations potentielles.
pdf
Cross-lingual alignment transfer: a chicken-and-egg story?
Lauriane Aufrant
|
Guillaume Wisniewski
|
François Yvon
Proceedings of the Workshop on Multilingual and Cross-lingual Methods in NLP
pdf
LIMSI@WMT’16: Machine Translation of News
Alexandre Allauzen
|
Lauriane Aufrant
|
Franck Burlot
|
Ophélie Lacroix
|
Elena Knyazeva
|
Thomas Lavergne
|
Guillaume Wisniewski
|
François Yvon
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers
pdf
The QT21/HimL Combined Machine Translation System
Jan-Thorsten Peter
|
Tamer Alkhouli
|
Hermann Ney
|
Matthias Huck
|
Fabienne Braune
|
Alexander Fraser
|
Aleš Tamchyna
|
Ondřej Bojar
|
Barry Haddow
|
Rico Sennrich
|
Frédéric Blain
|
Lucia Specia
|
Jan Niehues
|
Alex Waibel
|
Alexandre Allauzen
|
Lauriane Aufrant
|
Franck Burlot
|
Elena Knyazeva
|
Thomas Lavergne
|
François Yvon
|
Mārcis Pinnis
|
Stella Frank
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers
pdf
abs
Cross-lingual and Supervised Models for Morphosyntactic Annotation: a Comparison on Romanian
Lauriane Aufrant
|
Guillaume Wisniewski
|
François Yvon
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Because of the small size of Romanian corpora, the performance of a PoS tagger or a dependency parser trained with the standard supervised methods fall far short from the performance achieved in most languages. That is why, we apply state-of-the-art methods for cross-lingual transfer on Romanian tagging and parsing, from English and several Romance languages. We compare the performance with monolingual systems trained with sets of different sizes and establish that training on a few sentences in target language yields better results than transferring from large datasets in other languages.
pdf
abs
Zero-resource Dependency Parsing: Boosting Delexicalized Cross-lingual Transfer with Linguistic Knowledge
Lauriane Aufrant
|
Guillaume Wisniewski
|
François Yvon
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
This paper studies cross-lingual transfer for dependency parsing, focusing on very low-resource settings where delexicalized transfer is the only fully automatic option. We show how to boost parsing performance by rewriting the source sentences so as to better match the linguistic regularities of the target language. We contrast a data-driven approach with an approach relying on linguistically motivated rules automatically extracted from the World Atlas of Language Structures. Our findings are backed up by experiments involving 40 languages. They show that both approaches greatly outperform the baseline, the knowledge-driven method yielding the best accuracies, with average improvements of +2.9 UAS, and up to +90 UAS (absolute) on some frequent PoS configurations.