2022
pdf
abs
Is NLP Ready for Standardization?
Lauriane Aufrant
Findings of the Association for Computational Linguistics: EMNLP 2022
While standardization is a well-established activity in other scientific fields such as telecommunications, networks or multimedia, in the field of AI and more specifically NLP it is still at its dawn. In this paper, we explore how various aspects of NLP (evaluation, data, tasks...) lack standards and how that can impact science, but also the society, the industry, and regulations. We argue that the numerous initiatives to rationalize the field and establish good practices are only the first step, and developing formal standards remains needed to bring further clarity to NLP research and industry, at a time where this community faces various crises regarding ethics or reproducibility. We thus encourage NLP researchers to contribute to existing and upcoming standardization projects, so that they can express their needs and concerns, while sharing their expertise.
2018
pdf
abs
Quantifying training challenges of dependency parsers
Lauriane Aufrant
|
Guillaume Wisniewski
|
François Yvon
Proceedings of the 27th International Conference on Computational Linguistics
Not all dependencies are equal when training a dependency parser: some are straightforward enough to be learned with only a sample of data, others embed more complexity. This work introduces a series of metrics to quantify those differences, and thereby to expose the shortcomings of various parsing algorithms and strategies. Apart from a more thorough comparison of parsing systems, these new tools also prove useful for characterizing the information conveyed by cross-lingual parsers, in a quantitative but still interpretable way.
pdf
abs
Exploiting Dynamic Oracles to Train Projective Dependency Parsers on Non-Projective Trees
Lauriane Aufrant
|
Guillaume Wisniewski
|
François Yvon
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)
Because the most common transition systems are projective, training a transition-based dependency parser often implies to either ignore or rewrite the non-projective training examples, which has an adverse impact on accuracy. In this work, we propose a simple modification of dynamic oracles, which enables the use of non-projective data when training projective parsers. Evaluation on 73 treebanks shows that our method achieves significant gains (+2 to +7 UAS for the most non-projective languages) and consistently outperforms traditional projectivization and pseudo-projectivization approaches.
2017
pdf
abs
LIMSI@CoNLL’17: UD Shared Task
Lauriane Aufrant
|
Guillaume Wisniewski
|
François Yvon
Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies
This paper describes LIMSI’s submission to the CoNLL 2017 UD Shared Task, which is focused on small treebanks, and how to improve low-resourced parsing only by ad hoc combination of multiple views and resources. We present our approach for low-resourced parsing, together with a detailed analysis of the results for each test treebank. We also report extensive analysis experiments on model selection for the PUD treebanks, and on annotation consistency among UD treebanks.
pdf
abs
Don’t Stop Me Now! Using Global Dynamic Oracles to Correct Training Biases of Transition-Based Dependency Parsers
Lauriane Aufrant
|
Guillaume Wisniewski
|
François Yvon
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers
This paper formalizes a sound extension of dynamic oracles to global training, in the frame of transition-based dependency parsers. By dispensing with the pre-computation of references, this extension widens the training strategies that can be entertained for such parsers; we show this by revisiting two standard training procedures, early-update and max-violation, to correct some of their search space sampling biases. Experimentally, on the SPMRL treebanks, this improvement increases the similarity between the train and test distributions and yields performance improvements up to 0.7 UAS, without any computation overhead.
2016
pdf
abs
Cross-lingual and Supervised Models for Morphosyntactic Annotation: a Comparison on Romanian
Lauriane Aufrant
|
Guillaume Wisniewski
|
François Yvon
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Because of the small size of Romanian corpora, the performance of a PoS tagger or a dependency parser trained with the standard supervised methods fall far short from the performance achieved in most languages. That is why, we apply state-of-the-art methods for cross-lingual transfer on Romanian tagging and parsing, from English and several Romance languages. We compare the performance with monolingual systems trained with sets of different sizes and establish that training on a few sentences in target language yields better results than transferring from large datasets in other languages.
pdf
bib
abs
Apprentissage d’analyseur en dépendances cross-lingue par projection partielle de dépendances (Cross-lingual learning of dependency parsers from partially projected dependencies )
Ophélie Lacroix
|
Lauriane Aufrant
|
Guillaume Wisniewski
|
François Yvon
Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 2 : TALN (Articles longs)
Cet article présente une méthode simple de transfert cross-lingue de dépendances. Nous montrons tout d’abord qu’il est possible d’apprendre un analyseur en dépendances par transition à partir de données partiellement annotées. Nous proposons ensuite de construire de grands ensembles de données partiellement annotés pour plusieurs langues cibles en projetant les dépendances via les liens d’alignement les plus sûrs. En apprenant des analyseurs pour les langues cibles à partir de ces données partielles, nous montrons que cette méthode simple obtient des performances qui rivalisent avec celles de méthodes état-de-l’art récentes, tout en ayant un coût algorithmique moindre.
pdf
abs
Ne nous arrêtons pas en si bon chemin : améliorations de l’apprentissage global d’analyseurs en dépendances par transition (Don’t Stop Me Now ! Improved Update Strategies for Global Training of Transition-Based)
Lauriane Aufrant
|
Guillaume Wisniewski
|
François Yvon
Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 2 : TALN (Articles longs)
Dans cet article, nous proposons trois améliorations simples pour l’apprentissage global d’analyseurs en dépendances par transition de type A RC E AGER : un oracle non déterministe, la reprise sur le même exemple après une mise à jour et l’entraînement en configurations sous-optimales. Leur combinaison apporte un gain moyen de 0,2 UAS sur le corpus SPMRL. Nous introduisons également un cadre général permettant la comparaison systématique de ces stratégies et de la plupart des variantes connues. Nous montrons que la littérature n’a étudié que quelques stratégies parmi les nombreuses variations possibles, négligeant ainsi plusieurs pistes d’améliorations potentielles.
pdf
Frustratingly Easy Cross-Lingual Transfer for Transition-Based Dependency Parsing
Ophélie Lacroix
|
Lauriane Aufrant
|
Guillaume Wisniewski
|
François Yvon
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
pdf
Cross-lingual alignment transfer: a chicken-and-egg story?
Lauriane Aufrant
|
Guillaume Wisniewski
|
François Yvon
Proceedings of the Workshop on Multilingual and Cross-lingual Methods in NLP
pdf
LIMSI@WMT’16: Machine Translation of News
Alexandre Allauzen
|
Lauriane Aufrant
|
Franck Burlot
|
Ophélie Lacroix
|
Elena Knyazeva
|
Thomas Lavergne
|
Guillaume Wisniewski
|
François Yvon
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers
pdf
The QT21/HimL Combined Machine Translation System
Jan-Thorsten Peter
|
Tamer Alkhouli
|
Hermann Ney
|
Matthias Huck
|
Fabienne Braune
|
Alexander Fraser
|
Aleš Tamchyna
|
Ondřej Bojar
|
Barry Haddow
|
Rico Sennrich
|
Frédéric Blain
|
Lucia Specia
|
Jan Niehues
|
Alex Waibel
|
Alexandre Allauzen
|
Lauriane Aufrant
|
Franck Burlot
|
Elena Knyazeva
|
Thomas Lavergne
|
François Yvon
|
Mārcis Pinnis
|
Stella Frank
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers
pdf
abs
Zero-resource Dependency Parsing: Boosting Delexicalized Cross-lingual Transfer with Linguistic Knowledge
Lauriane Aufrant
|
Guillaume Wisniewski
|
François Yvon
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
This paper studies cross-lingual transfer for dependency parsing, focusing on very low-resource settings where delexicalized transfer is the only fully automatic option. We show how to boost parsing performance by rewriting the source sentences so as to better match the linguistic regularities of the target language. We contrast a data-driven approach with an approach relying on linguistically motivated rules automatically extracted from the World Atlas of Language Structures. Our findings are backed up by experiments involving 40 languages. They show that both approaches greatly outperform the baseline, the knowledge-driven method yielding the best accuracies, with average improvements of +2.9 UAS, and up to +90 UAS (absolute) on some frequent PoS configurations.