A Survey of MWE Identification Experiments: The Devil is in the Details

Carlos Ramisch, Abigail Walsh, Thomas Blanchard, Shiva Taslimipoor


Abstract
Multiword expression (MWE) identification has been the focus of numerous research papers, especially in the context of the DiMSUM and PARSEME Shared Tasks (STs). This survey analyses 40 MWE identification papers with experiments on data from these STs. We look at corpus selection, pre- and post-processing, MWE encoding, evaluation metrics, statistical significance, and error analyses. We find that these aspects are usually considered minor and/or omitted in the literature. However, they may considerably impact the results and the conclusions drawn from them. Therefore, we advocate for more systematic descriptions of experimental conditions to reduce the risk of misleading conclusions drawn from poorly designed experimental setup.
Anthology ID:
2023.mwe-1.15
Volume:
Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023)
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Archna Bhatia, Kilian Evang, Marcos Garcia, Voula Giouli, Lifeng Han, Shiva Taslimipoor
Venue:
MWE
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
106–120
Language:
URL:
https://aclanthology.org/2023.mwe-1.15
DOI:
10.18653/v1/2023.mwe-1.15
Bibkey:
Cite (ACL):
Carlos Ramisch, Abigail Walsh, Thomas Blanchard, and Shiva Taslimipoor. 2023. A Survey of MWE Identification Experiments: The Devil is in the Details. In Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023), pages 106–120, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
A Survey of MWE Identification Experiments: The Devil is in the Details (Ramisch et al., MWE 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2023.mwe-1.15.pdf
Software:
 2023.mwe-1.15.software.zip
Video:
 https://preview.aclanthology.org/landing_page/2023.mwe-1.15.mp4