Ilker Kesen
2026
Cetvel: A Unified Benchmark for Evaluating Language Understanding, Generation and Cultural Capacity of LLMs for Turkish
Yakup Abrek Er | Ilker Kesen | Gözde Gül Şahin | Aykut Erdem
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Yakup Abrek Er | Ilker Kesen | Gözde Gül Şahin | Aykut Erdem
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
We introduce Cetvel, a comprehensive benchmark designed to evaluate large language models (LLMs) in Turkish. Existing Turkish benchmarks often lack either task diversity or culturally relevant content, or both. Cetvel addresses these gaps by combining a broad range of both discriminative and generative tasks ensuring content that reflects the linguistic and cultural richness of Turkish language. Cetvel covers 23 tasks grouped into seven categories, including tasks such as grammatical error correction, machine translation, and question answering rooted in Turkish history and idiomatic language. We evaluate 33 open-weight LLMs (up to 70B parameters) covering different model families and instruction paradigms. Our experiments reveal that Turkish-centric instruction-tuned models generally underperform relative to multilingual or general-purpose models (e.g. Llama 3 and Mistral), despite being tailored for the language. Moreover, we show that tasks such as grammatical error correction and extractive question answering are particularly discriminative in differentiating model capabilities. Cetvel offers a comprehensive and culturally grounded evaluation suite for advancing the development and assessment of LLMs in Turkish.
CommonLID: Re-evaluating State-of-the-Art Language Identification Performance on Web Data
Pedro Ortiz Suarez | Laurie Burchell | Catherine Arnett | Rafael Mosquera | Sara Hincapi\'e Monsalve | Thom Vaughan | Damian Stewart | Malte Ostendorff | Idris Abdulmumin | Vukosi Marivate | Shamsuddeen Hassan Muhammad | Atnafu Lambebo Tonja | Hend Al-Khalifa | Nadia Ghezaiel Hammouda | Verrah Akinyi Otiende | Tack Hwa Wong | Jakhongir Saydaliev | Melika Nobakhtian | Muhammad Ravi Shulthan Habibi | Chalamalasetti Kranti | Carol Muchemi | Khang Nguyen | Faisal Muhammad Adam | Luis Frentzen Salim | Reem Alqifari | Cynthia Jayne Amol | Joseph Marvin Imperial | Ilker Kesen | Ahmad Mustafid | Pavel Stepachev | Leshem Choshen | David Anugraha | Hamada Nayel | Seid Muhie Yimam | Vallerie Alexandra Putra | My Chiffon Nguyen | Azmine Toushik Wasi | Gouthami Vadithya | Rob Van Der Goot | Lanwenn ar C'horr | Karan Dua | Andrew Yates | Mithil Bangera | Yeshil Bangera | Hitesh Laxmichand Patel | Shu Okabe | Fenal Ashokbhai Ilasariya | Dmitry Gaynullin | Genta Indra Winata | Yiyuan Li | Juan Pablo Mart{\'\i}nez | Amit Agarwal | Ikhlasul Akmal Hanif | Raia Abu Ahmad | Esther Adenuga | Filbert Aurelian Tjiaranata | Weerayut Buaphet | Michael Anugraha | Sowmya Vajjala | Benjamin L Rice | Azril Hafizi Amirudin | Jesujoba Oluwadara Alabi | Srikant Panda | Yassine Toughrai | Bruhan Kyomuhendo | Daniel Ruffinelli | Akshata | Manuel Goul\~ao | Ej Zhou | Ingrid Gabriela Franco Ramirez | Cristina Aggazzotti | Konstantin Dobler | Jun Kevin | Quentin Pag\`es | Nicholas Andrews | Nuhu Ibrahim | Mattes Ruckdeschel | Amr Keleg | Mike Zhang | Casper Rufaro Muziri | Saron Samuel | Sotaro Takeshita | Kun Kerdthaisong | Luca Foppiano | Rasul Dent | Tommaso Green | Ahmad Mustapha Wali | Kamohelo Makaaka | Vicky Feliren | Inshirah Idris | Hande Celikkanat | Abdulhamid Abubakar | Jean Maillard | Beno{\^\i}t Sagot | Thibault Cl\'erice | Kenton Murray | Sarah K. K. Luger
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Pedro Ortiz Suarez | Laurie Burchell | Catherine Arnett | Rafael Mosquera | Sara Hincapi\'e Monsalve | Thom Vaughan | Damian Stewart | Malte Ostendorff | Idris Abdulmumin | Vukosi Marivate | Shamsuddeen Hassan Muhammad | Atnafu Lambebo Tonja | Hend Al-Khalifa | Nadia Ghezaiel Hammouda | Verrah Akinyi Otiende | Tack Hwa Wong | Jakhongir Saydaliev | Melika Nobakhtian | Muhammad Ravi Shulthan Habibi | Chalamalasetti Kranti | Carol Muchemi | Khang Nguyen | Faisal Muhammad Adam | Luis Frentzen Salim | Reem Alqifari | Cynthia Jayne Amol | Joseph Marvin Imperial | Ilker Kesen | Ahmad Mustafid | Pavel Stepachev | Leshem Choshen | David Anugraha | Hamada Nayel | Seid Muhie Yimam | Vallerie Alexandra Putra | My Chiffon Nguyen | Azmine Toushik Wasi | Gouthami Vadithya | Rob Van Der Goot | Lanwenn ar C'horr | Karan Dua | Andrew Yates | Mithil Bangera | Yeshil Bangera | Hitesh Laxmichand Patel | Shu Okabe | Fenal Ashokbhai Ilasariya | Dmitry Gaynullin | Genta Indra Winata | Yiyuan Li | Juan Pablo Mart{\'\i}nez | Amit Agarwal | Ikhlasul Akmal Hanif | Raia Abu Ahmad | Esther Adenuga | Filbert Aurelian Tjiaranata | Weerayut Buaphet | Michael Anugraha | Sowmya Vajjala | Benjamin L Rice | Azril Hafizi Amirudin | Jesujoba Oluwadara Alabi | Srikant Panda | Yassine Toughrai | Bruhan Kyomuhendo | Daniel Ruffinelli | Akshata | Manuel Goul\~ao | Ej Zhou | Ingrid Gabriela Franco Ramirez | Cristina Aggazzotti | Konstantin Dobler | Jun Kevin | Quentin Pag\`es | Nicholas Andrews | Nuhu Ibrahim | Mattes Ruckdeschel | Amr Keleg | Mike Zhang | Casper Rufaro Muziri | Saron Samuel | Sotaro Takeshita | Kun Kerdthaisong | Luca Foppiano | Rasul Dent | Tommaso Green | Ahmad Mustapha Wali | Kamohelo Makaaka | Vicky Feliren | Inshirah Idris | Hande Celikkanat | Abdulhamid Abubakar | Jean Maillard | Beno{\^\i}t Sagot | Thibault Cl\'erice | Kenton Murray | Sarah K. K. Luger
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Language identification (LID) is a fundamental step in curating multilingual corpora. However, LID models still perform poorly for many languages, especially on the noisy and heterogeneous web data often used to train multilingual language models. In this paper, we introduce CommonLID, a community-driven, human-annotated LID benchmark for the web domain, covering 109 languages. Many of the included languages have been previously under-served, making CommonLID a key resource for developing more representative high-quality text corpora. We show CommonLID’s value by using it, alongside five other common evaluation sets, to test eight popular LID models. We analyse our results to situate our contribution and to provide an overview of the state of the art. In particular, we highlight that existing evaluations overestimate LID accuracy for many languages in the web domain. We make CommonLID and the code used to create it available under an open, permissive license.
2025
Multilingual Pretraining for Pixel Language Models
Ilker Kesen | Jonas F. Lotz | Ingo Ziegler | Phillip Rust | Desmond Elliott
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Ilker Kesen | Jonas F. Lotz | Ingo Ziegler | Phillip Rust | Desmond Elliott
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Pixel language models operate directly on images of rendered text, eliminating the need for a fixed vocabulary. While these models have demonstrated strong capabilities for downstream cross-lingual transfer, multilingual pretraining remains underexplored. We introduce PIXEL-M4, a model pretrained on four visually and linguistically diverse languages: English, Hindi, Ukrainian, and Simplified Chinese. Multilingual evaluations on semantic and syntactic tasks show that PIXEL-M4 outperforms an English-only counterpart on non-Latin scripts. Word-level probing analyses confirm that PIXEL-M4 captures rich linguistic features, even in languages not seen during pretraining. Furthermore, an analysis of its hidden representations shows that multilingual pretraining yields a semantic embedding space closely aligned across the languages used for pretraining. This work demonstrates that multilingual pretraining substantially enhances the capability of pixel language models to effectively support a diverse set of languages.
2022
Detecting Euphemisms with Literal Descriptions and Visual Imagery
Ilker Kesen | Aykut Erdem | Erkut Erdem | Iacer Calixto
Proceedings of the 3rd Workshop on Figurative Language Processing (FLP)
Ilker Kesen | Aykut Erdem | Erkut Erdem | Iacer Calixto
Proceedings of the 3rd Workshop on Figurative Language Processing (FLP)
This paper describes our two-stage system for the Euphemism Detection shared task hosted by the 3rd Workshop on Figurative Language Processing in conjunction with EMNLP 2022. Euphemisms tone down expressions about sensitive or unpleasant issues like addiction and death. The ambiguous nature of euphemistic words or expressions makes it challenging to detect their actual meaning within a context. In the first stage, we seek to mitigate this ambiguity by incorporating literal descriptions into input text prompts to our baseline model. It turns out that this kind of direct supervision yields remarkable performance improvement. In the second stage, we integrate visual supervision into our system using visual imageries, two sets of images generated by a text-to-image model by taking terms and descriptions as input. Our experiments demonstrate that visual supervision also gives a statistically significant performance boost. Our system achieved the second place with an F1 score of 87.2%, only about 0.9% worse than the best submission.
CRAFT: A Benchmark for Causal Reasoning About Forces and inTeractions
Tayfun Ates | M. Ateşoğlu | Çağatay Yiğit | Ilker Kesen | Mert Kobas | Erkut Erdem | Aykut Erdem | Tilbe Goksun | Deniz Yuret
Findings of the Association for Computational Linguistics: ACL 2022
Tayfun Ates | M. Ateşoğlu | Çağatay Yiğit | Ilker Kesen | Mert Kobas | Erkut Erdem | Aykut Erdem | Tilbe Goksun | Deniz Yuret
Findings of the Association for Computational Linguistics: ACL 2022
Humans are able to perceive, understand and reason about causal events. Developing models with similar physical and causal understanding capabilities is a long-standing goal of artificial intelligence. As a step towards this direction, we introduce CRAFT, a new video question answering dataset that requires causal reasoning about physical forces and object interactions. It contains 58K video and question pairs that are generated from 10K videos from 20 different virtual environments, containing various objects in motion that interact with each other and the scene. Two question categories in CRAFT include previously studied descriptive and counterfactual questions. Additionally, inspired by the Force Dynamics Theory in cognitive linguistics, we introduce a new causal question category that involves understanding the causal interactions between objects through notions like cause, enable, and prevent. Our results show that even though the questions in CRAFT are easy for humans, the tested baseline models, including existing state-of-the-art methods, do not yet deal with the challenges posed in our benchmark.
Search
Fix author
Co-authors
- Aykut Erdem 3
- Erkut Erdem 2
- Idris Abdulmumin 1
- Abdulhamid Abubakar 1
- Faisal Muhammad Adam 1
- Esther Adenuga 1
- Amit Agarwal 1
- Cristina Aggazzotti 1
- Raia Abu Ahmad 1
- Akshata 1
- Hend Al-Khalifa 1
- Jesujoba Alabi 1
- Reem Alqifari 1
- Azril Hafizi Amirudin 1
- Cynthia Jayne Amol 1
- Nicholas Andrews 1
- David Anugraha 1
- Michael Anugraha 1
- Catherine Arnett 1
- Tayfun Ates 1
- M. Ateşoğlu 1
- Mithil Bangera 1
- Yeshil Bangera 1
- Weerayut Buaphet 1
- Laurie Burchell 1
- Lanwenn ar C'horr 1
- Iacer Calixto 1
- Hande Celikkanat 1
- Kranti Chalamalasetti 1
- Leshem Choshen 1
- Thibault Cl\'erice 1
- Rasul Dent 1
- Konstantin Dobler 1
- Karan Dua 1
- Desmond Elliott 1
- Yakup Abrek Er 1
- Vicky Feliren 1
- Luca Foppiano 1
- Dmitry Gaynullin 1
- Tilbe Goksun 1
- Manuel Goul\~ao 1
- Tommaso Green 1
- Muhammad Ravi Shulthan Habibi 1
- Nadia Ghezaiel Hammouda 1
- Ikhlasul Akmal Hanif 1
- Nuhu Ibrahim 1
- Inshirah Idris 1
- Fenal Ashokbhai Ilasariya 1
- Joseph Marvin Imperial 1
- Amr Keleg 1
- Kun Kerdthaisong 1
- Jun Kevin 1
- Mert Kobas 1
- Bruhan Kyomuhendo 1
- Yiyuan Li 1
- Jonas F. Lotz 1
- Sarah K. K. Luger 1
- Jean Maillard 1
- Kamohelo Makaaka 1
- Vukosi Marivate 1
- Juan Pablo Martínez 1
- Sara Hincapi\'e Monsalve 1
- Rafael Mosquera 1
- Carol Muchemi 1
- Shamsuddeen Hassan Muhammad 1
- Kenton Murray 1
- Ahmad Mustafid 1
- Casper Rufaro Muziri 1
- Hamada Nayel 1
- Khang Nguyen 1
- My Chiffon Nguyen 1
- Melika Nobakhtian 1
- Shu Okabe 1
- Pedro Ortiz Suarez 1
- Malte Ostendorff 1
- Verrah Akinyi Otiende 1
- Quentin Pag\`es 1
- Srikant Panda 1
- Hitesh Laxmichand Patel 1
- Vallerie Alexandra Putra 1
- Ingrid Gabriela Franco Ramirez 1
- Benjamin L Rice 1
- Mattes Ruckdeschel 1
- Daniel Ruffinelli 1
- Phillip Rust 1
- Benoît Sagot 1
- Luis Frentzen Salim 1
- Saron Samuel 1
- Jakhongir Saydaliev 1
- Pavel Stepachev 1
- Damian Stewart 1
- Sotaro Takeshita 1
- Filbert Aurelian Tjiaranata 1
- Atnafu Lambebo Tonja 1
- Yassine Toughrai 1
- Gouthami Vadithya 1
- Sowmya Vajjala 1
- Rob Van Der Goot 1
- Thom Vaughan 1
- Ahmad Mustapha Wali 1
- Azmine Toushik Wasi 1
- Genta Indra Winata 1
- Tack Hwa Wong 1
- Andrew Yates 1
- Seid Muhie Yimam 1
- Çağatay Yiğit 1
- Deniz Yuret 1
- Mike Zhang 1
- Ej Zhou 1
- Ingo Ziegler 1
- Gözde Gül Şahin 1