Ebru Çavuşoğlu
2026
An Idiom Benchmark for Turkish
Ebru Çavuşoğlu | Cagri Coltekin
Proceedings of the 22nd Workshop on Multiword Expressions (MWE 2026)
Ebru Çavuşoğlu | Cagri Coltekin
Proceedings of the 22nd Workshop on Multiword Expressions (MWE 2026)
Despite recent significant advances, idioms, like other forms of figurative language, present a challenge to natural language processing (NLP). Benchmark corpora are essential for improving the current models on understanding idioms. However, such corpora are only available for a limited set of languages. In this paper, we introduce our ongoing work on a benchmark corpus of Turkish idioms. Our corpus is structured for testing both idiom recognition and idiom understanding. The corpus is currently consists of 200 instances with sentences including idiomatic use, their literal paraphrases, similar sentences with no entailment, and non-idiomatic use of the idiomatic expressions when possible. We describe the methodology used to create the corpus, as well as initial experiments with a selection of LLMs.