Cognitive Signatures of Multi-Word Expressions: Reading-Time and Surprisal

Diego Alves, Sergei Bagdasarov, Elke Teich


Abstract
This study investigates whether eye-tracking measures predict if a word is the final token of a multi-word expression (MWE), focusing on two understudied MWE types: fixed expressions (e.g., due to) and phrasal verbs (e.g., turn out). Using mixed-effects logistic regression, we compared tokens in MWE contexts with the same tokens in non-MWE contexts. Results reveal a clear difference in processing. For fixed expressions, reading-time measures significantly predict MWEhood. In contrast, phrasal verbs show no consistent predictive effects. Additionally, we compared the reading-time models to models that included GPT-2 surprisal as a predictor. While surprisal does predict MWEhood, it fails to capture the distinction between types. These findings highlight the need to consider MWE typology in models of formulaic language processing.
Anthology ID:
2026.mwe-1.5
Volume:
Proceedings of the 22nd Workshop on Multiword Expressions (MWE 2026)
Month:
March
Year:
2026
Address:
Rabat, Marocco
Editors:
Atul Kr. Ojha, Verginica Barbu Mititelu, Mathieu Constant, Ivelina Stoyanova, A. Seza Doğruöz, Alexandre Rademaker
Venues:
MWE | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
48–53
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.mwe-1.5/
DOI:
Bibkey:
Cite (ACL):
Diego Alves, Sergei Bagdasarov, and Elke Teich. 2026. Cognitive Signatures of Multi-Word Expressions: Reading-Time and Surprisal. In Proceedings of the 22nd Workshop on Multiword Expressions (MWE 2026), pages 48–53, Rabat, Marocco. Association for Computational Linguistics.
Cite (Informal):
Cognitive Signatures of Multi-Word Expressions: Reading-Time and Surprisal (Alves et al., MWE 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.mwe-1.5.pdf