Nicolas Zampieri


2019

pdf bib
The Impact of Word Representations on Sequential Neural MWE Identification
Nicolas Zampieri | Carlos Ramisch | Geraldine Damnati
Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019)

Recent initiatives such as the PARSEME shared task allowed the rapid development of MWE identification systems. Many of those are based on recent NLP advances, using neural sequence models that take continuous word representations as input. We study two related questions in neural MWE identification: (a) the use of lemmas and/or surface forms as input features, and (b) the use of word-based or character-based embeddings to represent them. Our experiments on Basque, French, and Polish show that character-based representations yield systematically better results than word-based ones. In some cases, character-based representations of surface forms can be used as a proxy for lemmas, depending on the morphological complexity of the language.

2018

pdf bib
Veyn at PARSEME Shared Task 2018: Recurrent Neural Networks for VMWE Identification
Nicolas Zampieri | Manon Scholivet | Carlos Ramisch | Benoit Favre
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)

This paper describes the Veyn system, submitted to the closed track of the PARSEME Shared Task 2018 on automatic identification of verbal multiword expressions (VMWEs). Veyn is based on a sequence tagger using recurrent neural networks. We represent VMWEs using a variant of the begin-inside-outside encoding scheme combined with the VMWE category tag. In addition to the system description, we present development experiments to determine the best tagging scheme. Veyn is freely available, covers 19 languages, and was ranked ninth (MWE-based) and eight (Token-based) among 13 submissions, considering macro-averaged F1 across languages.