Cengiz Acarturk

Also published as: Cengiz Acartürk

2026

We present SemEval-2026 Task 9, a shared task on online polarization detection, covering 22 languages and comprising over 110K annotated instances. Each data instance is multi-labeled with the presence of polarization, polarization type, and polarization manifestation. Participants were asked to predict labels in three subtasks: (1) detecting the presence of polarization, (2) identifying the type of polarization, and (3) recognizing the polarization manifestation. The three tasks attracted over 1,000 participants worldwide and more than 10k submissions on Codabench. We received final submissions from 67 teams and 69 system description papers. We report the baseline results and analyze the performance of the best-performing systems, highlighting the most common approaches and the most effective methods across different subtasks and languages. The dataset and other resources for this task are publicly available.

bib abs

We present the MultiplEYE Text Corpus, a large-scale, document-level, multi-parallel resource designed to advance cross-linguistic research on reading and language processing. The corpus provides paragraph-level alignment for texts in 39 languages spanning seven language families and seven scripts. Unlike many existing multilingual corpora, a substantial number of documents were originally written in languages other than English, reducing English-centric bias and supporting more typologically diverse investigations. The texts are carefully selected to balance linguistic richness with experimental feasibility, particularly for eye-tracking-while-reading studies. Developed within a multi-lab initiative, the MultiplEYE Text Corpus follows unified translation, alignment, and experimental design guidelines to ensure cross-linguistic comparability. Its inclusion of texts varying in type and difficulty enables research on discourse- level processing, genre effects, and individual differences across a wide range of languages. The text corpus and accompanying metadata provide a robust foundation for multilingual psycholinguistic and computational modeling research. Data and materials are publicly available at https://doi.org/10.23668/psycharchives.22294.

bib abs

A Survey of Incorporating Gaze Data into Natural Language Processing Models and Applications
Cengiz Acarturk | Burcu Can | Melike Caglayan | Jamal Abdul Nasir | Cagri Coltekin
Proceedings fo the Second International Workshop on Eye-Tracking Resources and Evaluation for Human-Aligned NLP

This study presents a survey of research integrating eye-tracking (gaze) data into Language Models (LMs) as a means of cognitively grounding NLP models and applications in human reading behavior. Although contemporary LMs excel at learning statistical patterns from text, they fundamentally lack human-like reading and comprehension capabilities. Incorporating gaze data may offer a window into cognitive processing, yet its impact on LMs remains underexplored. Addressing a persistent bottleneck, namely, the high cost and limited scale of laboratory eye-tracking, we propose a roadmap consisting of three streams of research for advancing this novel research domain: (1) developing cognitive multimodal corpora, (2) leveraging generative models for gaze synthesis to overcome the data bottleneck caused by the high costs of human eye-tracking, and (3) training LMs with gaze-guided attention mechanisms and input augmentation. Furthermore, we illustrate practical applications in readability assessment, educational analytics, and assistive communication, demonstrating how gaze-informed models can enable adaptive technologies. Finally, we critically examine ongoing challenges, including the lack of data standardization, the misalignment between human and machine language processing, and the urgent ethical imperative for privacy-preserving architectures to protect sensitive biometric gaze data, motivating privacy-aware data practices and model designs for scalable deployment.

pdf bib

Proceedings fo the Second International Workshop on Eye-Tracking Resources and Evaluation for Human-Aligned NLP
Cengiz Acartürk | Burcu Can | Jamal Nasir | Çağrı Çöltekin
Proceedings fo the Second International Workshop on Eye-Tracking Resources and Evaluation for Human-Aligned NLP

pdf bib abs

Online polarization poses a growing challenge for democratic discourse, yet most computational social science research remains monolingual, culturally narrow, or event-specific. We introduce POLAR, a multilingual, multicultural, and multi-event dataset with over 110K instances in 22 languages drawn from diverse online platforms and real-world events. Polarization is annotated along three axes, namely detection, type, and manifestation, using a variety of annotation platforms adapted to each cultural context. We conduct two main experiments: (1) fine-tuning six pretrained small language models; and (2) evaluating a range of open and closed large language models in few-shot and zero-shot settings. Results show that while most models perform well on binary polarization detection, they achieve substantially lower performance when predicting polarization types and manifestations. These findings highlight the complex, highly contextual nature of polarization and underscore the need for robust, adaptable approaches in NLP and computational social science. All resources will be released to support further research and effective mitigation of digital polarization globally.

Team ReadMe at CMCL 2021 Shared Task: Predicting Human Reading Patterns by Traditional Oculomotor Control Models and Machine Learning
Alisan Balkoca | Abdullah Algan | Cengiz Acarturk | Çağrı Çöltekin
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics

This system description paper describes our participation in CMCL 2021 shared task on predicting human reading patterns. Our focus in this study is making use of well-known,traditional oculomotor control models and machine learning systems. We present experiments with a traditional oculomotor control model (the EZ Reader) and two machine learning models (a linear regression model and a re-current network model), as well as combining the two different models. In all experiments we test effects of features well-known in the literature for predicting reading patterns, such as frequency, word length and predictability. Our experiments support the earlier findings that such features are useful when combined. Furthermore, we show that although machine learning models perform better in comparison to traditional models, combination of both gives a consistent improvement for predicting multiple eye tracking variables during reading.

Cengiz Acarturk

2026

2025

2024

2021

2010

Co-authors

Venues