MultiVitaminBooster at PARSEME Shared Task 2020: Combining Window- and Dependency-Based Features with Multilingual Contextualised Word Embeddings for VMWE Detection

Sebastian Gombert; Sabine Bartsch

MultiVitaminBooster at PARSEME Shared Task 2020: Combining Window- and Dependency-Based Features with Multilingual Contextualised Word Embeddings for VMWE Detection

Abstract

In this paper, we present MultiVitaminBooster, a system implemented for the PARSEME shared task on semi-supervised identification of verbal multiword expressions - edition 1.2. For our approach, we interpret detecting verbal multiword expressions as a token classification task aiming to decide whether a token is part of a verbal multiword expression or not. For this purpose, we train gradient boosting-based models. We encode tokens as feature vectors combining multilingual contextualized word embeddings provided by the XLM-RoBERTa language model with a more traditional linguistic feature set relying on context windows and dependency relations. Our system was ranked 7th in the official open track ranking of the shared task evaluations with an encoding-related bug distorting the results. For this reason we carry out further unofficial evaluations. Unofficial versions of our systems would have achieved higher ranks.

Anthology ID:: 2020.mwe-1.20
Volume:: Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons
Month:: December
Year:: 2020
Address:: online
Editors:: Stella Markantonatou, John McCrae, Jelena Mitrović, Carole Tiberius, Carlos Ramisch, Ashwini Vaidya, Petya Osenova, Agata Savary
Venue:: MWE
SIG:: SIGLEX
Publisher:: Association for Computational Linguistics
Note:
Pages:: 149–155
Language:
URL:: https://aclanthology.org/2020.mwe-1.20
DOI:
Bibkey:
Cite (ACL):: Sebastian Gombert and Sabine Bartsch. 2020. MultiVitaminBooster at PARSEME Shared Task 2020: Combining Window- and Dependency-Based Features with Multilingual Contextualised Word Embeddings for VMWE Detection. In Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons, pages 149–155, online. Association for Computational Linguistics.
Cite (Informal):: MultiVitaminBooster at PARSEME Shared Task 2020: Combining Window- and Dependency-Based Features with Multilingual Contextualised Word Embeddings for VMWE Detection (Gombert & Bartsch, MWE 2020)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-2023-videos/2020.mwe-1.20.pdf

PDF Search