Diversity patterns run deep: Impact of diversity intake on multiword expression identification
Mathilde Deletombe, Manon Scholivet, Louis Estève, Thomas Lavergne, Agata Savary
Abstract
Multiword expressions (MWEs) are good examples of a phenomenon where identification systems struggle with generalisation: MWE present in the test set but absent in the training set are rarely identified. This raises the question of the diversity of the test set, relative to that of the train set, and how this impacts performance. We set out to measure how much diversity of a train corpus increases when adding individual MWEs from the test corpus, and how this increase impacts MWE identification performance. We measure diversity across a three-dimension framework and find mostly consistent negative correlations with performance in 14 languages and 8 systems.- Anthology ID:
- 2026.mwe-1.13
- Volume:
- Proceedings of the 22nd Workshop on Multiword Expressions (MWE 2026)
- Month:
- March
- Year:
- 2026
- Address:
- Rabat, Marocco
- Editors:
- Atul Kr. Ojha, Verginica Barbu Mititelu, Mathieu Constant, Ivelina Stoyanova, A. Seza Doğruöz, Alexandre Rademaker
- Venues:
- MWE | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 110–116
- Language:
- URL:
- https://preview.aclanthology.org/ingest-eacl/2026.mwe-1.13/
- DOI:
- Cite (ACL):
- Mathilde Deletombe, Manon Scholivet, Louis Estève, Thomas Lavergne, and Agata Savary. 2026. Diversity patterns run deep: Impact of diversity intake on multiword expression identification. In Proceedings of the 22nd Workshop on Multiword Expressions (MWE 2026), pages 110–116, Rabat, Marocco. Association for Computational Linguistics.
- Cite (Informal):
- Diversity patterns run deep: Impact of diversity intake on multiword expression identification (Deletombe et al., MWE 2026)
- PDF:
- https://preview.aclanthology.org/ingest-eacl/2026.mwe-1.13.pdf