Saba Anwar
2026
SemEval-2026 Task 9: Detecting Multilingual, Multicultural and Multievent Online Polarization
Usman Naseem | Robert Geislinger | Ada Ren | Sarah Kohail | Rudy Garrido Veliz | P Sam Sahil | Yiran Zhang | Marco Antonio Stranisci | Idris Abdulmumin | Özge Alacam | Cengiz Acarturk | Aisha Jabr | Saba Anwar | Abinew Ali Ayele | Elena Tutubalina | Aung Kyaw Htet | Xintong Wang | Surendrabikram Thapa | Tanmoy Chakraborty | Dheeraj Kodati | Sahar Moradizeyveh | Firoj Alam | Ye Kyaw Thu | Shantipriya Parida | Ihsan Ayyub Qazi | Lilian Diana Awuor Wanzare | Nelson Odhiambo | Clemencia Siro | Ibrahim Said Ahmad | Adem Chanie Ali | Martin Semmann | Chris Biemann | Shamsuddeen Hassan Muhammad | Seid Muhie Yimam
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Usman Naseem | Robert Geislinger | Ada Ren | Sarah Kohail | Rudy Garrido Veliz | P Sam Sahil | Yiran Zhang | Marco Antonio Stranisci | Idris Abdulmumin | Özge Alacam | Cengiz Acarturk | Aisha Jabr | Saba Anwar | Abinew Ali Ayele | Elena Tutubalina | Aung Kyaw Htet | Xintong Wang | Surendrabikram Thapa | Tanmoy Chakraborty | Dheeraj Kodati | Sahar Moradizeyveh | Firoj Alam | Ye Kyaw Thu | Shantipriya Parida | Ihsan Ayyub Qazi | Lilian Diana Awuor Wanzare | Nelson Odhiambo | Clemencia Siro | Ibrahim Said Ahmad | Adem Chanie Ali | Martin Semmann | Chris Biemann | Shamsuddeen Hassan Muhammad | Seid Muhie Yimam
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
We present SemEval-2026 Task 9, a shared task on online polarization detection, covering 22 languages and comprising over 110K annotated instances. Each data instance is multi-labeled with the presence of polarization, polarization type, and polarization manifestation. Participants were asked to predict labels in three subtasks: (1) detecting the presence of polarization, (2) identifying the type of polarization, and (3) recognizing the polarization manifestation. The three tasks attracted over 1,000 participants worldwide and more than 10k submissions on Codabench. We received final submissions from 67 teams and 69 system description papers. We report the baseline results and analyze the performance of the best-performing systems, highlighting the most common approaches and the most effective methods across different subtasks and languages. The dataset and other resources for this task are publicly available.
POLAR: A Benchmark for Multilingual, Multicultural, and Multi-Event Online Polarization
Usman Naseem | Robert Geislinger | Juan Ren | Sarah Kohail | Rudy Alexandro Garrido Veliz | P Sam Sahil | Yiran Zhang | Idris Abdulmumin | Marco Antonio Stranisci | Özge Alacam | Cengiz Acarturk | Aisha Jabr | Saba Anwar | Abinew Ali Ayele | Simona Frenda | Alessandra Teresa Cignarella | Elena Tutubalina | Oleg Rogov | Aung Kyaw Htet | Xintong Wang | Surendrabikram Thapa | Kritesh Rauniyar | Tanmoy Chakraborty | MD Arfeen Zeeshan | Dheeraj Kodati | Satya Keerthi | Sahar Moradizeyveh | Firoj Alam | Md Arid Hasan | Syed Ishtiaque Ahmed | Ye Kyaw Thu | Shantipriya Parida | Ihsan Ayyub Qazi | Lilian Diana Awuor Wanzare | Nelson Odhiambo Onyango | Clemencia Siro | Jane Wanjiru Kimani | Ibrahim Said Ahmad | Adem Chanie Ali | Martin Semmann | Chris Biemann | Shamsuddeen Hassan Muhammad | Seid Muhie Yimam
Findings of the Association for Computational Linguistics: ACL 2026
Usman Naseem | Robert Geislinger | Juan Ren | Sarah Kohail | Rudy Alexandro Garrido Veliz | P Sam Sahil | Yiran Zhang | Idris Abdulmumin | Marco Antonio Stranisci | Özge Alacam | Cengiz Acarturk | Aisha Jabr | Saba Anwar | Abinew Ali Ayele | Simona Frenda | Alessandra Teresa Cignarella | Elena Tutubalina | Oleg Rogov | Aung Kyaw Htet | Xintong Wang | Surendrabikram Thapa | Kritesh Rauniyar | Tanmoy Chakraborty | MD Arfeen Zeeshan | Dheeraj Kodati | Satya Keerthi | Sahar Moradizeyveh | Firoj Alam | Md Arid Hasan | Syed Ishtiaque Ahmed | Ye Kyaw Thu | Shantipriya Parida | Ihsan Ayyub Qazi | Lilian Diana Awuor Wanzare | Nelson Odhiambo Onyango | Clemencia Siro | Jane Wanjiru Kimani | Ibrahim Said Ahmad | Adem Chanie Ali | Martin Semmann | Chris Biemann | Shamsuddeen Hassan Muhammad | Seid Muhie Yimam
Findings of the Association for Computational Linguistics: ACL 2026
Online polarization poses a growing challenge for democratic discourse, yet most computational social science research remains monolingual, culturally narrow, or event-specific. We introduce POLAR, a multilingual, multicultural, and multi-event dataset with over 110K instances in 22 languages drawn from diverse online platforms and real-world events. Polarization is annotated along three axes, namely detection, type, and manifestation, using a variety of annotation platforms adapted to each cultural context. We conduct two main experiments: (1) fine-tuning six pretrained small language models; and (2) evaluating a range of open and closed large language models in few-shot and zero-shot settings. Results show that while most models perform well on binary polarization detection, they achieve substantially lower performance when predicting polarization types and manifestations. These findings highlight the complex, highly contextual nature of polarization and underscore the need for robust, adaptable approaches in NLP and computational social science. All resources will be released to support further research and effective mitigation of digital polarization globally.
Argument-Based Comparative Question Answering Evaluation Benchmark
Irina Nikishina | Saba Anwar | Nikolay Dolgov | Maria Manina | Daria Ignatenko | Viktor Moskvoretskii | Artem Shelmanov | Tim Baldwin | Chris Biemann
Proceedings of the 13th Workshop on Argument Mining and Reasoning
Irina Nikishina | Saba Anwar | Nikolay Dolgov | Maria Manina | Daria Ignatenko | Viktor Moskvoretskii | Artem Shelmanov | Tim Baldwin | Chris Biemann
Proceedings of the 13th Workshop on Argument Mining and Reasoning
Despite the ability of large language models (LLMs) to generate coherent comparative answers, automatic comparative question answering (CQA) remains challenging due to the absence of standardized evaluation criteria and the high resource demands of manual assessment. To address these problems, this paper proposes a comprehensive evaluation framework designed to assess the quality of CQA summaries using LLMs-as-a-Judge. We formulate 15 evaluation criteria for assessing comparative answers generated by various sources, including LLMs, human experts, and prior work. To capture a diverse range of comparative answers, LLM summaries were generated under various prompting scenarios. We evaluate the effectiveness of our framework using both human assessment and LLMs, demonstrating the consistency between automated and manual evaluations. Finally, we fine-tune Llama-3-8B-Instruct on a dataset generated from the best-performing CQA models in our evaluation.
2025
How to Compare Things Properly? A Study of Argument Relevance in Comparative Question Answering
Irina Nikishina | Saba Anwar | Nikolay Dolgov | Maria Manina | Daria Ignatenko | Artem Shelmanov | Chris Biemann
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Irina Nikishina | Saba Anwar | Nikolay Dolgov | Maria Manina | Daria Ignatenko | Artem Shelmanov | Chris Biemann
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Comparative Question Answering (CQA) lies at the intersection of Question Answering, Argument Mining, and Summarization. It poses unique challenges due to the inherently subjective nature of many questions and the need to integrate diverse perspectives. Although the CQA task can be addressed using recently emerged instruction-following Large Language Models (LLMs), challenges such as hallucinations in their outputs and the lack of transparent argument provenance remain significant limitations.To address these challenges, we construct a manually curated dataset comprising arguments annotated with their relevance. These arguments are further used to answer comparative questions, enabling precise traceability and faithfulness. Furthermore, we define explicit criteria for an “ideal” comparison and introduce a benchmark for evaluating the outputs of various Retrieval-Augmented Generation (RAG) models with respect to argument relevance. All code and data are publicly released to support further research.
2022
More Like This: Semantic Retrieval with Linguistic Information
Steffen Remus | Gregor Wiedemann | Saba Anwar | Fynn Petersen-Frey | Seid Muhie Yimam | Chris Biemann
Proceedings of the 18th Conference on Natural Language Processing (KONVENS 2022)
Steffen Remus | Gregor Wiedemann | Saba Anwar | Fynn Petersen-Frey | Seid Muhie Yimam | Chris Biemann
Proceedings of the 18th Conference on Natural Language Processing (KONVENS 2022)
2021
SCoT: Sense Clustering over Time: a tool for the analysis of lexical change
Christian Haase | Saba Anwar | Seid Muhie Yimam | Alexander Friedrich | Chris Biemann
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations
Christian Haase | Saba Anwar | Seid Muhie Yimam | Alexander Friedrich | Chris Biemann
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations
We present Sense Clustering over Time (SCoT), a novel network-based tool for analysing lexical change. SCoT represents the meanings of a word as clusters of similar words. It visualises their formation, change, and demise. There are two main approaches to the exploration of dynamic networks: the discrete one compares a series of clustered graphs from separate points in time. The continuous one analyses the changes of one dynamic network over a time-span. SCoT offers a new hybrid solution. First, it aggregates time-stamped documents into intervals and calculates one sense graph per discrete interval. Then, it merges the static graphs to a new type of dynamic semantic neighbourhood graph over time. The resulting sense clusters offer uniquely detailed insights into lexical change over continuous intervals with model transparency and provenance. SCoT has been successfully used in a European study on the changing meaning of ‘crisis’.
2020
Generating Lexical Representations of Frames using Lexical Substitution
Saba Anwar | Artem Shelmanov | Alexander Panchenko | Chris Biemann
Proceedings of the Probability and Meaning Conference (PaM 2020)
Saba Anwar | Artem Shelmanov | Alexander Panchenko | Chris Biemann
Proceedings of the Probability and Meaning Conference (PaM 2020)
Semantic frames are formal linguistic structures describing situations/actions/events, e.g. Commercial transfer of goods. Each frame provides a set of roles corresponding to the situation participants, e.g. Buyer and Goods, and lexical units (LUs) – words and phrases that can evoke this particular frame in texts, e.g. Sell. The scarcity of annotated resources hinders wider adoption of frame semantics across languages and domains. We investigate a simple yet effective method, lexical substitution with word representation models, to automatically expand a small set of frame-annotated sentences with new words for their respective roles and LUs. We evaluate the expansion quality using FrameNet. Contextualized models demonstrate overall superior performance compared to the non-contextualized ones on roles. However, the latter show comparable performance on the task of LU expansion.
2019
HHMM at SemEval-2019 Task 2: Unsupervised Frame Induction using Contextualized Word Embeddings
Saba Anwar | Dmitry Ustalov | Nikolay Arefyev | Simone Paolo Ponzetto | Chris Biemann | Alexander Panchenko
Proceedings of the 13th International Workshop on Semantic Evaluation
Saba Anwar | Dmitry Ustalov | Nikolay Arefyev | Simone Paolo Ponzetto | Chris Biemann | Alexander Panchenko
Proceedings of the 13th International Workshop on Semantic Evaluation
We present our system for semantic frame induction that showed the best performance in Subtask B.1 and finished as the runner-up in Subtask A of the SemEval 2019 Task 2 on unsupervised semantic frame induction (Qasem-iZadeh et al., 2019). Our approach separates this task into two independent steps: verb clustering using word and their context embeddings and role labeling by combining these embeddings with syntactical features. A simple combination of these steps shows very competitive results and can be extended to process other datasets and languages.
Search
Fix author
Co-authors
- Chris Biemann 8
- Seid Muhie Yimam 4
- Artem Shelmanov 3
- Idris Abdulmumin 2
- Cengiz Acarturk 2
- Ibrahim Said Ahmad 2
- Özge Alacam 2
- Firoj Alam 2
- Adem Chanie Ali 2
- Abinew Ali Ayele 2
- Tanmoy Chakraborty 2
- Nikolay Dolgov 2
- Robert Geislinger 2
- Aung Kyaw Htet 2
- Daria Ignatenko 2
- Aisha Jabr 2
- Dheeraj Kodati 2
- Sarah Kohail 2
- Maria Manina 2
- Sahar Moradizeyveh 2
- Shamsuddeen Hassan Muhammad 2
- Usman Naseem 2
- Irina Nikishina 2
- Alexander Panchenko 2
- Shantipriya Parida 2
- Ihsan Ayyub Qazi 2
- P Sam Sahil 2
- Martin Semmann 2
- Clemencia Siro 2
- Marco Antonio Stranisci 2
- Surendrabikram Thapa 2
- Ye Kyaw Thu 2
- Elena Tutubalina 2
- Xintong Wang 2
- Lilian Diana Awuor Wanzare 2
- Yiran Zhang 2
- Syed Ishtiaque Ahmed 1
- Nikolay Arefyev 1
- Timothy Baldwin 1
- Alessandra Teresa Cignarella 1
- Simona Frenda 1
- Alexander Friedrich 1
- Rudy Garrido Veliz 1
- Christian Haase 1
- Md. Arid Hasan 1
- Satya Keerthi 1
- Jane Wanjiru Kimani 1
- Viktor Moskvoretskii 1
- Nelson Odhiambo 1
- Nelson Odhiambo Onyango 1
- Fynn Petersen-Frey 1
- Simone Paolo Ponzetto 1
- Kritesh Rauniyar 1
- Steffen Remus 1
- Ada Ren 1
- Juan Ren 1
- Oleg Rogov 1
- Dmitry Ustalov 1
- Rudy Alexandro Garrido Veliz 1
- Gregor Wiedemann 1
- MD Arfeen Zeeshan 1