Saba Anwar
2026
POLAR: A Benchmark for Multilingual, Multicultural, and Multi-Event Online Polarization
Usman Naseem | Robert Geislinger | Juan Ren | Sarah Kohail | Rudy Alexandro Garrido Veliz | P Sam Sahil | Yiran Zhang | Idris Abdulmumin | Marco Antonio Stranisci | \"Ozge Alacam | Cengiz Acarturk | Aisha Jabr | Saba Anwar | Abinew Ali Ayele | Simona Frenda | Alessandra Teresa Cignarella | Elena Tutubalina | Oleg Rogov | Aung Kyaw Htet | Xintong Wang | Surendrabikram Thapa | Kritesh Rauniyar | Tanmoy Chakraborty | MD Arfeen Zeeshan | Dheeraj Kodati | Satya Keerthi | Sahar Moradizeyveh | Firoj Alam | Md Arid Hasan | Syed Ishtiaque Ahmed | Ye Kyaw Thu | Shantipriya Parida | Ihsan Ayyub Qazi | Lilian Diana Awuor Wanzare | Nelson Odhiambo Onyango | Clemencia Siro | Jane Wanjiru Kimani | Ibrahim Said Ahmad | Adem Chanie Ali | Martin Semmann | Chris Biemann | Shamsuddeen Hassan Muhammad | Seid Muhie Yimam
Findings of the Association for Computational Linguistics: ACL 2026
Usman Naseem | Robert Geislinger | Juan Ren | Sarah Kohail | Rudy Alexandro Garrido Veliz | P Sam Sahil | Yiran Zhang | Idris Abdulmumin | Marco Antonio Stranisci | \"Ozge Alacam | Cengiz Acarturk | Aisha Jabr | Saba Anwar | Abinew Ali Ayele | Simona Frenda | Alessandra Teresa Cignarella | Elena Tutubalina | Oleg Rogov | Aung Kyaw Htet | Xintong Wang | Surendrabikram Thapa | Kritesh Rauniyar | Tanmoy Chakraborty | MD Arfeen Zeeshan | Dheeraj Kodati | Satya Keerthi | Sahar Moradizeyveh | Firoj Alam | Md Arid Hasan | Syed Ishtiaque Ahmed | Ye Kyaw Thu | Shantipriya Parida | Ihsan Ayyub Qazi | Lilian Diana Awuor Wanzare | Nelson Odhiambo Onyango | Clemencia Siro | Jane Wanjiru Kimani | Ibrahim Said Ahmad | Adem Chanie Ali | Martin Semmann | Chris Biemann | Shamsuddeen Hassan Muhammad | Seid Muhie Yimam
Findings of the Association for Computational Linguistics: ACL 2026
Online polarization poses a growing challenge for democratic discourse, yet most computational social science research remains monolingual, culturally narrow, or event-specific. We introduce POLAR, a multilingual, multicultural, and multi-event dataset with over 110K instances in 22 languages drawn from diverse online platforms and real-world events. Polarization is annotated along three axes, namely detection, type, and manifestation, using a variety of annotation platforms adapted to each cultural context. We conduct two main experiments: (1) fine-tuning six pretrained small language models; and (2) evaluating a range of open and closed large language models in few-shot and zero-shot settings. Results show that while most models perform well on binary polarization detection, they achieve substantially lower performance when predicting polarization types and manifestations. These findings highlight the complex, highly contextual nature of polarization and underscore the need for robust, adaptable approaches in NLP and computational social science. All resources will be released to support further research and effective mitigation of digital polarization globally.
2025
How to Compare Things Properly? A Study of Argument Relevance in Comparative Question Answering
Irina Nikishina | Saba Anwar | Nikolay Dolgov | Maria Manina | Daria Ignatenko | Artem Shelmanov | Chris Biemann
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Irina Nikishina | Saba Anwar | Nikolay Dolgov | Maria Manina | Daria Ignatenko | Artem Shelmanov | Chris Biemann
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Comparative Question Answering (CQA) lies at the intersection of Question Answering, Argument Mining, and Summarization. It poses unique challenges due to the inherently subjective nature of many questions and the need to integrate diverse perspectives. Although the CQA task can be addressed using recently emerged instruction-following Large Language Models (LLMs), challenges such as hallucinations in their outputs and the lack of transparent argument provenance remain significant limitations.To address these challenges, we construct a manually curated dataset comprising arguments annotated with their relevance. These arguments are further used to answer comparative questions, enabling precise traceability and faithfulness. Furthermore, we define explicit criteria for an “ideal” comparison and introduce a benchmark for evaluating the outputs of various Retrieval-Augmented Generation (RAG) models with respect to argument relevance. All code and data are publicly released to support further research.
2022
More Like This: Semantic Retrieval with Linguistic Information
Steffen Remus | Gregor Wiedemann | Saba Anwar | Fynn Petersen-Frey | Seid Muhie Yimam | Chris Biemann
Proceedings of the 18th Conference on Natural Language Processing (KONVENS 2022)
Steffen Remus | Gregor Wiedemann | Saba Anwar | Fynn Petersen-Frey | Seid Muhie Yimam | Chris Biemann
Proceedings of the 18th Conference on Natural Language Processing (KONVENS 2022)
2021
SCoT: Sense Clustering over Time: a tool for the analysis of lexical change
Christian Haase | Saba Anwar | Seid Muhie Yimam | Alexander Friedrich | Chris Biemann
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations
Christian Haase | Saba Anwar | Seid Muhie Yimam | Alexander Friedrich | Chris Biemann
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations
We present Sense Clustering over Time (SCoT), a novel network-based tool for analysing lexical change. SCoT represents the meanings of a word as clusters of similar words. It visualises their formation, change, and demise. There are two main approaches to the exploration of dynamic networks: the discrete one compares a series of clustered graphs from separate points in time. The continuous one analyses the changes of one dynamic network over a time-span. SCoT offers a new hybrid solution. First, it aggregates time-stamped documents into intervals and calculates one sense graph per discrete interval. Then, it merges the static graphs to a new type of dynamic semantic neighbourhood graph over time. The resulting sense clusters offer uniquely detailed insights into lexical change over continuous intervals with model transparency and provenance. SCoT has been successfully used in a European study on the changing meaning of ‘crisis’.
2020
Generating Lexical Representations of Frames using Lexical Substitution
Saba Anwar | Artem Shelmanov | Alexander Panchenko | Chris Biemann
Proceedings of the Probability and Meaning Conference (PaM 2020)
Saba Anwar | Artem Shelmanov | Alexander Panchenko | Chris Biemann
Proceedings of the Probability and Meaning Conference (PaM 2020)
Semantic frames are formal linguistic structures describing situations/actions/events, e.g. Commercial transfer of goods. Each frame provides a set of roles corresponding to the situation participants, e.g. Buyer and Goods, and lexical units (LUs) – words and phrases that can evoke this particular frame in texts, e.g. Sell. The scarcity of annotated resources hinders wider adoption of frame semantics across languages and domains. We investigate a simple yet effective method, lexical substitution with word representation models, to automatically expand a small set of frame-annotated sentences with new words for their respective roles and LUs. We evaluate the expansion quality using FrameNet. Contextualized models demonstrate overall superior performance compared to the non-contextualized ones on roles. However, the latter show comparable performance on the task of LU expansion.
2019
HHMM at SemEval-2019 Task 2: Unsupervised Frame Induction using Contextualized Word Embeddings
Saba Anwar | Dmitry Ustalov | Nikolay Arefyev | Simone Paolo Ponzetto | Chris Biemann | Alexander Panchenko
Proceedings of the 13th International Workshop on Semantic Evaluation
Saba Anwar | Dmitry Ustalov | Nikolay Arefyev | Simone Paolo Ponzetto | Chris Biemann | Alexander Panchenko
Proceedings of the 13th International Workshop on Semantic Evaluation
We present our system for semantic frame induction that showed the best performance in Subtask B.1 and finished as the runner-up in Subtask A of the SemEval 2019 Task 2 on unsupervised semantic frame induction (Qasem-iZadeh et al., 2019). Our approach separates this task into two independent steps: verb clustering using word and their context embeddings and role labeling by combining these embeddings with syntactical features. A simple combination of these steps shows very competitive results and can be extended to process other datasets and languages.
Search
Fix author
Co-authors
- Chris Biemann 6
- Seid Muhie Yimam 3
- Alexander Panchenko 2
- Artem Shelmanov 2
- Idris Abdulmumin 1
- Cengiz Acarturk 1
- Ibrahim Said Ahmad 1
- Syed Ishtiaque Ahmed 1
- Özge Alacam 1
- Firoj Alam 1
- Adem Chanie Ali 1
- Nikolay Arefyev 1
- Abinew Ali Ayele 1
- Tanmoy Chakraborty 1
- Alessandra Teresa Cignarella 1
- Nikolay Dolgov 1
- Simona Frenda 1
- Alexander Friedrich 1
- Robert Geislinger 1
- Christian Haase 1
- Md. Arid Hasan 1
- Aung Kyaw Htet 1
- Daria Ignatenko 1
- Aisha Jabr 1
- Satya Keerthi 1
- Jane Wanjiru Kimani 1
- Dheeraj Kodati 1
- Sarah Kohail 1
- Maria Manina 1
- Sahar Moradizeyveh 1
- Shamsuddeen Hassan Muhammad 1
- Usman Naseem 1
- Irina Nikishina 1
- Nelson Odhiambo Onyango 1
- Shantipriya Parida 1
- Fynn Petersen-Frey 1
- Simone Paolo Ponzetto 1
- Ihsan Ayyub Qazi 1
- Kritesh Rauniyar 1
- Steffen Remus 1
- Juan Ren 1
- Oleg Rogov 1
- P Sam Sahil 1
- Martin Semmann 1
- Clemencia Siro 1
- Marco Antonio Stranisci 1
- Surendrabikram Thapa 1
- Ye Kyaw Thu 1
- Elena Tutubalina 1
- Dmitry Ustalov 1
- Rudy Alexandro Garrido Veliz 1
- Xintong Wang 1
- Lilian Diana Awuor Wanzare 1
- Gregor Wiedemann 1
- MD Arfeen Zeeshan 1
- Yiran Zhang 1