Abstract
Recent initiatives such as the PARSEME shared task allowed the rapid development of MWE identification systems. Many of those are based on recent NLP advances, using neural sequence models that take continuous word representations as input. We study two related questions in neural MWE identification: (a) the use of lemmas and/or surface forms as input features, and (b) the use of word-based or character-based embeddings to represent them. Our experiments on Basque, French, and Polish show that character-based representations yield systematically better results than word-based ones. In some cases, character-based representations of surface forms can be used as a proxy for lemmas, depending on the morphological complexity of the language.- Anthology ID:
- W19-5121
- Volume:
- Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019)
- Month:
- August
- Year:
- 2019
- Address:
- Florence, Italy
- Editors:
- Agata Savary, Carla Parra Escartín, Francis Bond, Jelena Mitrović, Verginica Barbu Mititelu
- Venue:
- MWE
- SIG:
- SIGLEX
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 169–175
- Language:
- URL:
- https://aclanthology.org/W19-5121
- DOI:
- 10.18653/v1/W19-5121
- Cite (ACL):
- Nicolas Zampieri, Carlos Ramisch, and Geraldine Damnati. 2019. The Impact of Word Representations on Sequential Neural MWE Identification. In Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019), pages 169–175, Florence, Italy. Association for Computational Linguistics.
- Cite (Informal):
- The Impact of Word Representations on Sequential Neural MWE Identification (Zampieri et al., MWE 2019)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-1/W19-5121.pdf