Manon Scholivet


Typological Features for Multilingual Delexicalised Dependency Parsing
Manon Scholivet | Franck Dary | Alexis Nasr | Benoit Favre | Carlos Ramisch
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

The existence of universal models to describe the syntax of languages has been debated for decades. The availability of resources such as the Universal Dependencies treebanks and the World Atlas of Language Structures make it possible to study the plausibility of universal grammar from the perspective of dependency parsing. Our work investigates the use of high-level language descriptions in the form of typological features for multilingual dependency parsing. Our experiments on multilingual parsing for 40 languages show that typological information can indeed guide parsers to share information between similar languages beyond simple language identification.

Méthodes de représentation de la langue pour l’analyse syntaxique multilingue (Language representation methods for multilingual syntactic parsing )
Manon Scholivet
Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Volume III : RECITAL

L’existence de modèles universels pour décrire la syntaxe des langues a longtemps été débattue. L’apparition de ressources comme le World Atlas of Language Structures et les corpus des Universal Dependencies rend possible l’étude d’une grammaire universelle pour l’analyse syntaxique en dépendances. Notre travail se concentre sur l’étude de différentes représentations des langues dans des systèmes multilingues appris sur des corpus arborés de 37 langues. Nos tests d’analyse syntaxique montrent que représenter la langue dont est issu chaque mot permet d’obtenir de meilleurs résultats qu’en cas d’un apprentissage sur une simple concaténation des langues. En revanche, l’utilisation d’un vecteur pour représenter la langue ne permet pas une amélioration évidente des résultats dans le cas d’une langue n’ayant pas du tout de données d’apprentissage.


Veyn at PARSEME Shared Task 2018: Recurrent Neural Networks for VMWE Identification
Nicolas Zampieri | Manon Scholivet | Carlos Ramisch | Benoit Favre
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)

This paper describes the Veyn system, submitted to the closed track of the PARSEME Shared Task 2018 on automatic identification of verbal multiword expressions (VMWEs). Veyn is based on a sequence tagger using recurrent neural networks. We represent VMWEs using a variant of the begin-inside-outside encoding scheme combined with the VMWE category tag. In addition to the system description, we present development experiments to determine the best tagging scheme. Veyn is freely available, covers 19 languages, and was ranked ninth (MWE-based) and eight (Token-based) among 13 submissions, considering macro-averaged F1 across languages.


Identification of Ambiguous Multiword Expressions Using Sequence Models and Lexical Resources
Manon Scholivet | Carlos Ramisch
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)

We present a simple and efficient tagger capable of identifying highly ambiguous multiword expressions (MWEs) in French texts. It is based on conditional random fields (CRF), using local context information as features. We show that this approach can obtain results that, in some cases, approach more sophisticated parser-based MWE identification methods without requiring syntactic trees from a treebank. Moreover, we study how well the CRF can take into account external information coming from a lexicon.