Diana Turmakhan


2025

pdf bib
KazMMLU: Evaluating Language Models on Kazakh, Russian, and Regional Knowledge of Kazakhstan
Mukhammed Togmanov | Nurdaulet Mukhituly | Diana Turmakhan | Jonibek Mansurov | Maiya Goloburda | Akhmed Sakip | Zhuohan Xie | Yuxia Wang | Bekassyl Syzdykov | Nurkhan Laiyk | Alham Fikri Aji | Ekaterina Kochmar | Preslav Nakov | Fajri Koto
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Despite having a population of twenty million, Kazakhstan’s culture and language remain underrepresented in the field of natural language processing. Although large language models (LLMs) continue to advance worldwide, progress in Kazakh language has been limited, as seen in the scarcity of dedicated models and benchmark evaluations. To address this gap, we introduce KazMMLU, the first MMLU-style dataset specifically designed for Kazakh language. KazMMLU comprises 23,000 questions that cover various educational levels, including STEM, humanities, and social sciences, sourced from authentic educational materials and manually validated by native speakers and educators. The dataset includes 10,969 Kazakh questions and 12,031 Russian questions, reflecting Kazakhstan’s bilingual education system and rich local context. Our evaluation of several state-of-the-art multilingual models (Llama3.1, Qwen-2.5, GPT-4, and DeepSeek V3) demonstrates substantial room for improvement, as even the best-performing models struggle to achieve competitive performance in Kazakh and Russian. These findings highlight significant performance gaps compared to high-resource languages. We hope that our dataset will enable further research and development of Kazakh-centric LLMs.

pdf bib
Qorǵau: Evaluating Safety in Kazakh-Russian Bilingual Contexts
Maiya Goloburda | Nurkhan Laiyk | Diana Turmakhan | Yuxia Wang | Mukhammed Togmanov | Jonibek Mansurov | Askhat Sametov | Nurdaulet Mukhituly | Minghan Wang | Daniil Orel | Zain Muhammad Mujahid | Fajri Koto | Timothy Baldwin | Preslav Nakov
Findings of the Association for Computational Linguistics: ACL 2025

Large language models (LLMs) are known to have the potential to generate harmful content, posing risks to users. While significant progress has been made in developing taxonomies for LLM risks and safety evaluation prompts, most studies have focused on monolingual contexts, primarily in English. However, language- and region-specific risks in bilingual contexts are often overlooked, and core findings can diverge from those in monolingual settings. In this paper, we introduce Qorǵau, a novel dataset specifically designed for safety evaluation in Kazakh and Russian, reflecting the unique bilingual context in Kazakhstan, where both Kazakh (a low-resource language) and Russian (a high-resource language) are spoken. Experiments with both multilingual and language-specific LLMs reveal notable differences in safety performance, emphasizing the need for tailored, region-specific datasets to ensure the responsible and safe deployment of LLMs in countries like Kazakhstan. Warning: this paper contains example data that may be offensive, harmful, or biased.

2024

pdf bib
FRAPPE: FRAming, Persuasion, and Propaganda Explorer
Ahmed Sajwani | Alaa El Setohy | Ali Mekky | Diana Turmakhan | Lara Hassan | Mohamed El Zeftawy | Omar El Herraoui | Osama Mohammed Afzal | Qisheng Liao | Tarek Mahmoud | Zain Muhammad Mujahid | Muhammad Umar Salman | Muhammad Arslan Manzoor | Massa Baali | Jakub Piskorski | Nicolas Stefanovitch | Giovanni Da San Martino | Preslav Nakov
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations

The abundance of news sources and the urgent demand for reliable information have led to serious concerns about the threat of misleading information. In this paper, we present FRAPPE, a FRAming, Persuasion, and Propaganda Explorer system. FRAPPE goes beyond conventional news analysis of articles and unveils the intricate linguistic techniques used to shape readers’ opinions and emotions. Our system allows users not only to analyze individual articles for their genre, framings, and use of persuasion techniques, but also to draw comparisons between the strategies of persuasion and framing adopted by a diverse pool of news outlets and countries across multiple languages for different topics, thus providing a comprehensive understanding of how information is presented and manipulated. FRAPPE is publicly accessible at https://frappe.streamlit.app/ and a video explaining our system is available at https://www.youtube.com/watch?v=3RlTfSVnZmk