Yiran Zhang
2026
POLAR: A Benchmark for Multilingual, Multicultural, and Multi-Event Online Polarization
Usman Naseem | Robert Geislinger | Juan Ren | Sarah Kohail | Rudy Alexandro Garrido Veliz | P Sam Sahil | Yiran Zhang | Idris Abdulmumin | Marco Antonio Stranisci | \"Ozge Alacam | Cengiz Acarturk | Aisha Jabr | Saba Anwar | Abinew Ali Ayele | Simona Frenda | Alessandra Teresa Cignarella | Elena Tutubalina | Oleg Rogov | Aung Kyaw Htet | Xintong Wang | Surendrabikram Thapa | Kritesh Rauniyar | Tanmoy Chakraborty | MD Arfeen Zeeshan | Dheeraj Kodati | Satya Keerthi | Sahar Moradizeyveh | Firoj Alam | Md Arid Hasan | Syed Ishtiaque Ahmed | Ye Kyaw Thu | Shantipriya Parida | Ihsan Ayyub Qazi | Lilian Diana Awuor Wanzare | Nelson Odhiambo Onyango | Clemencia Siro | Jane Wanjiru Kimani | Ibrahim Said Ahmad | Adem Chanie Ali | Martin Semmann | Chris Biemann | Shamsuddeen Hassan Muhammad | Seid Muhie Yimam
Findings of the Association for Computational Linguistics: ACL 2026
Usman Naseem | Robert Geislinger | Juan Ren | Sarah Kohail | Rudy Alexandro Garrido Veliz | P Sam Sahil | Yiran Zhang | Idris Abdulmumin | Marco Antonio Stranisci | \"Ozge Alacam | Cengiz Acarturk | Aisha Jabr | Saba Anwar | Abinew Ali Ayele | Simona Frenda | Alessandra Teresa Cignarella | Elena Tutubalina | Oleg Rogov | Aung Kyaw Htet | Xintong Wang | Surendrabikram Thapa | Kritesh Rauniyar | Tanmoy Chakraborty | MD Arfeen Zeeshan | Dheeraj Kodati | Satya Keerthi | Sahar Moradizeyveh | Firoj Alam | Md Arid Hasan | Syed Ishtiaque Ahmed | Ye Kyaw Thu | Shantipriya Parida | Ihsan Ayyub Qazi | Lilian Diana Awuor Wanzare | Nelson Odhiambo Onyango | Clemencia Siro | Jane Wanjiru Kimani | Ibrahim Said Ahmad | Adem Chanie Ali | Martin Semmann | Chris Biemann | Shamsuddeen Hassan Muhammad | Seid Muhie Yimam
Findings of the Association for Computational Linguistics: ACL 2026
Online polarization poses a growing challenge for democratic discourse, yet most computational social science research remains monolingual, culturally narrow, or event-specific. We introduce POLAR, a multilingual, multicultural, and multi-event dataset with over 110K instances in 22 languages drawn from diverse online platforms and real-world events. Polarization is annotated along three axes, namely detection, type, and manifestation, using a variety of annotation platforms adapted to each cultural context. We conduct two main experiments: (1) fine-tuning six pretrained small language models; and (2) evaluating a range of open and closed large language models in few-shot and zero-shot settings. Results show that while most models perform well on binary polarization detection, they achieve substantially lower performance when predicting polarization types and manifestations. These findings highlight the complex, highly contextual nature of polarization and underscore the need for robust, adaptable approaches in NLP and computational social science. All resources will be released to support further research and effective mitigation of digital polarization globally.
2025
TurnBench-MS: A Benchmark for Evaluating Multi-Turn, Multi-Step Reasoning in Large Language Models
Yiran Zhang | Mo Wang | Xiaoyang Li | Kaixuan Ren | Chencheng Zhu | Usman Naseem
Findings of the Association for Computational Linguistics: EMNLP 2025
Yiran Zhang | Mo Wang | Xiaoyang Li | Kaixuan Ren | Chencheng Zhu | Usman Naseem
Findings of the Association for Computational Linguistics: EMNLP 2025
Despite impressive advances in large language models (LLMs), existing benchmarks often focus on single-turn or single-step tasks, failing to capture the kind of iterative reasoning required in real-world settings. To address this limitation, we introduce **TurnBench**, a novel benchmark that evaluates multi-turn, multi-step reasoning through an interactive code-breaking task inspired by the “Turing Machine Board Game.” In each episode, a model must uncover hidden logical or arithmetic rules by making sequential guesses, receiving structured feedback, and integrating clues across multiple rounds. This dynamic setup requires models to reason over time, adapt based on past information, and maintain consistency across steps—capabilities underexplored in current benchmarks. TurnBench includes two modes: *Classic*, which tests standard reasoning, and *Nightmare*, which introduces increased complexity and requires robust inferential chains. To support fine-grained analysis, we provide ground-truth annotations for intermediate reasoning steps. Our evaluation of state-of-the-art LLMs reveals significant gaps: the best model achieves 84% accuracy in Classic mode, but performance drops to 18% in Nightmare mode. In contrast, human participants achieve 100% in both, underscoring the challenge TurnBench poses to current models. By incorporating feedback loops and hiding task rules, TurnBench reduces contamination risks and provides a rigorous testbed for diagnosing and advancing multi-step, multi-turn reasoning in LLMs.
Alignment of Large Language Models with Human Preferences and Values
Usman Naseem | Gautam Siddharth Kashyap | Kaixuan Ren | Yiran Zhang | Utsav Maskey | Juan Ren | Afrozah Nadeem
Proceedings of the 23rd Annual Workshop of the Australasian Language Technology Association
Usman Naseem | Gautam Siddharth Kashyap | Kaixuan Ren | Yiran Zhang | Utsav Maskey | Juan Ren | Afrozah Nadeem
Proceedings of the 23rd Annual Workshop of the Australasian Language Technology Association
Large Language Models (LLMs) have demonstrated remarkable capabilities, yet their reliability and alignment with human expectations remain unresolved challenges. This tutorial introduces the foundations of alignment and provides participants with a conceptual and practical understanding of the field. Core principles such as values, safety, reasoning, and pluralism will be presented through intuitive explanations, worked examples, and case studies. The aim is to equip attendees with the ability to reason about alignment goals, understand how existing methods operate in practice, and critically evaluate their strengths and limitations.
From Generation to Detection: A Multimodal Multi-Task Dataset for Benchmarking Health Misinformation
Zhihao Zhang | Yiran Zhang | Xiyue Zhou | Liting Huang | Imran Razzak | Preslav Nakov | Usman Naseem
Findings of the Association for Computational Linguistics: EMNLP 2025
Zhihao Zhang | Yiran Zhang | Xiyue Zhou | Liting Huang | Imran Razzak | Preslav Nakov | Usman Naseem
Findings of the Association for Computational Linguistics: EMNLP 2025
Infodemics and health misinformation have significant negative impact on individuals and society, exacerbating confusion and increasing hesitancy in adopting recommended health measures. Recent advancements in generative AI, capable of producing realistic, human-like text and images, have significantly accelerated the spread and expanded the reach of health misinformation, resulting in an alarming surge in its dissemination. To combat the infodemics, most existing work has focused on developing misinformation datasets from social media and fact-checking platforms, but has faced limitations in topical coverage, inclusion of AI-generation, and accessibility of raw content. To address these gaps, we present MM-Health, a large scale multimodal misinformation dataset in the health domain consisting of 34,746 news article encompassing both textual and visual information. MM-Health includes human-generated multimodal information (5,776 articles) and AI-generated multimodal information (28,880 articles) from various SOTA generative AI models. Additionally, We benchmarked our dataset against three tasks—reliability checks, originality checks, and fine-grained AI detection—demonstrating that existing SOTA models struggle to accurately distinguish the reliability and origin of information. Our dataset aims to support the development of misinformation detection across various health scenarios, facilitating the detection of human and machine-generated content at multimodal levels.
Search
Fix author
Co-authors
- Usman Naseem 4
- Kaixuan Ren 2
- Juan Ren 2
- Idris Abdulmumin 1
- Cengiz Acarturk 1
- Ibrahim Said Ahmad 1
- Syed Ishtiaque Ahmed 1
- Özge Alacam 1
- Firoj Alam 1
- Adem Chanie Ali 1
- Saba Anwar 1
- Abinew Ali Ayele 1
- Chris Biemann 1
- Tanmoy Chakraborty 1
- Alessandra Teresa Cignarella 1
- Simona Frenda 1
- Robert Geislinger 1
- Md. Arid Hasan 1
- Aung Kyaw Htet 1
- Liting Huang 1
- Aisha Jabr 1
- Gautam Siddharth Kashyap 1
- Satya Keerthi 1
- Jane Wanjiru Kimani 1
- Dheeraj Kodati 1
- Sarah Kohail 1
- Xiaoyang Li 1
- Utsav Maskey 1
- Sahar Moradizeyveh 1
- Shamsuddeen Hassan Muhammad 1
- Afrozah Nadeem 1
- Preslav Nakov 1
- Nelson Odhiambo Onyango 1
- Shantipriya Parida 1
- Ihsan Ayyub Qazi 1
- Kritesh Rauniyar 1
- Imran Razzak 1
- Oleg Rogov 1
- P Sam Sahil 1
- Martin Semmann 1
- Clemencia Siro 1
- Marco Antonio Stranisci 1
- Surendrabikram Thapa 1
- Ye Kyaw Thu 1
- Elena Tutubalina 1
- Rudy Alexandro Garrido Veliz 1
- Mo Wang 1
- Xintong Wang 1
- Lilian Diana Awuor Wanzare 1
- Seid Muhie Yimam 1
- MD Arfeen Zeeshan 1
- Zhihao Zhang 1
- Xiyue Zhou 1
- Chencheng Zhu 1