Easy as PIE? Identifying Multi-Word Expressions with LLMs

Kai Golan Hashiloni, Ofri Hefetz, Kfir Bar


Abstract
We investigate the identification of idiomatic expressions—a semantically non-compositional subclass of multiword expressions (MWEs)—in running text using large language models (LLMs) without any fine-tuning. Instead, we adopt a prompt-based approach and evaluate a range of prompting strategies, including zero-shot, few-shot, and chain-of-thought variants, across multiple languages, datasets, and model types. Our experiments show that, with well-crafted prompts, LLMs can perform competitively with supervised models trained on annotated data. These findings highlight the potential of prompt-based LLMs as a flexible and effective alternative for idiomatic expression identification.
Anthology ID:
2025.emnlp-main.1213
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
23782–23801
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1213/
DOI:
Bibkey:
Cite (ACL):
Kai Golan Hashiloni, Ofri Hefetz, and Kfir Bar. 2025. Easy as PIE? Identifying Multi-Word Expressions with LLMs. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 23782–23801, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Easy as PIE? Identifying Multi-Word Expressions with LLMs (Hashiloni et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1213.pdf
Checklist:
 2025.emnlp-main.1213.checklist.pdf