Mathilde Deletombe
2026
Diversity patterns run deep: Impact of diversity intake on multiword expression identification
Mathilde Deletombe | Manon Scholivet | Louis Estève | Thomas Lavergne | Agata Savary
Proceedings of the 22nd Workshop on Multiword Expressions (MWE 2026)
Mathilde Deletombe | Manon Scholivet | Louis Estève | Thomas Lavergne | Agata Savary
Proceedings of the 22nd Workshop on Multiword Expressions (MWE 2026)
Multiword expressions (MWEs) are good examples of a phenomenon where identification systems struggle with generalisation: MWE present in the test set but absent in the training set are rarely identified. This raises the question of the diversity of the test set, relative to that of the train set, and how this impacts performance. We set out to measure how much diversity of a train corpus increases when adding individual MWEs from the test corpus, and how this increase impacts MWE identification performance. We measure diversity across a three-dimension framework and find mostly consistent negative correlations with performance in 14 languages and 8 systems.