Diversity patterns run deep: Impact of diversity intake on multiword expression identification

Mathilde Deletombe, Manon Scholivet, Louis Estève, Thomas Lavergne, Agata Savary


Abstract
Multiword expressions (MWEs) are good examples of a phenomenon where identification systems struggle with generalisation: MWE present in the test set but absent in the training set are rarely identified. This raises the question of the diversity of the test set, relative to that of the train set, and how this impacts performance. We set out to measure how much diversity of a train corpus increases when adding individual MWEs from the test corpus, and how this increase impacts MWE identification performance. We measure diversity across a three-dimension framework and find mostly consistent negative correlations with performance in 14 languages and 8 systems.
Anthology ID:
2026.mwe-1.13
Volume:
Proceedings of the 22nd Workshop on Multiword Expressions (MWE 2026)
Month:
March
Year:
2026
Address:
Rabat, Marocco
Editors:
Atul Kr. Ojha, Verginica Barbu Mititelu, Mathieu Constant, Ivelina Stoyanova, A. Seza Doğruöz, Alexandre Rademaker
Venues:
MWE | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
110–116
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.mwe-1.13/
DOI:
Bibkey:
Cite (ACL):
Mathilde Deletombe, Manon Scholivet, Louis Estève, Thomas Lavergne, and Agata Savary. 2026. Diversity patterns run deep: Impact of diversity intake on multiword expression identification. In Proceedings of the 22nd Workshop on Multiword Expressions (MWE 2026), pages 110–116, Rabat, Marocco. Association for Computational Linguistics.
Cite (Informal):
Diversity patterns run deep: Impact of diversity intake on multiword expression identification (Deletombe et al., MWE 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.mwe-1.13.pdf