Abstract
This article focuses on the lemmatization of multiword expressions (MWEs). We propose a deep encoder-decoder architecture generating for every MWE word its corresponding part in the lemma, based on the internal context of the MWE. The encoder relies on recurrent networks based on (1) the character sequence of the individual words to capture their morphological properties, and (2) the word sequence of the MWE to capture lexical and syntactic properties. The decoder in charge of generating the corresponding part of the lemma for each word of the MWE is based on a classical character-level attention-based recurrent model. Our model is evaluated for Italian, French, Polish and Portuguese and shows good performances except for Polish.- Anthology ID:
- W19-5117
- Volume:
- Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019)
- Month:
- August
- Year:
- 2019
- Address:
- Florence, Italy
- Venue:
- MWE
- SIG:
- SIGLEX
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 142–148
- Language:
- URL:
- https://aclanthology.org/W19-5117
- DOI:
- 10.18653/v1/W19-5117
- Cite (ACL):
- Marine Schmitt and Mathieu Constant. 2019. Neural Lemmatization of Multiword Expressions. In Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019), pages 142–148, Florence, Italy. Association for Computational Linguistics.
- Cite (Informal):
- Neural Lemmatization of Multiword Expressions (Schmitt & Constant, MWE 2019)
- PDF:
- https://preview.aclanthology.org/remove-xml-comments/W19-5117.pdf