2025
pdf
bib
abs
Toward Culturally-Aware Arabic Debate Platforms with NLP Support
Khalid Al Khatib
|
Mohammad Khader
Proceedings of The Third Arabic Natural Language Processing Conference
Despite the growing importance of online discourse, Arabic-speaking communities lack platforms that support structured, culturally grounded debate. Mainstream social media rarely fosters constructive engagement, often leading to polarization and superficial exchanges. This paper proposes the development of a culturally aware debate platform tailored to the values and traditions of Arabic-speaking users, with a focus on leveraging advances in natural language processing (NLP). We present findings from a user survey that explores experiences with existing debate tools and expectations for future platforms. Besides, we analyze 30,000 English-language debate topics using large language models (LLMs) to assess their cultural relevance and appropriateness for Arab audiences. We further examine the ability of LLMs to generate new culturally resonant debate topics, contributing to the emerging tasks of culture-aware topic assessment and generation. Finally, we propose a theoretical and technical framework for building an NLP-supported Arabic debate platform. Our work highlights the urgent need for culturally sensitive NLP resources that foster critical thinking, digital literacy, and meaningful deliberation in Arabic.
pdf
bib
abs
Transfer or Translate? Argument Mining in Arabic with No Native Annotations
Sara Nabhani
|
Khalid Al Khatib
Proceedings of The Third Arabic Natural Language Processing Conference
Argument mining for Arabic remains underexplored, largely due to the scarcity of annotated corpora. To address this gap, we examine the effectiveness of cross-lingual transfer from English. Using the English Persuasive Essays (PE) corpus, annotated with argumentative components (Major Claim, Claim, and Premise), we explore several transfer strategies: training encoder-based multilingual and monolingual models on English data, machine-translated Arabic data, and their combination. We further assess the impact of annotation noise introduced during translation by manually correcting portions of the projected training data. In addition, we investigate the potential of prompting large language models (LLMs) for the task. Experiments on a manually corrected Arabic test set show that monolingual models trained on translated data achieve the strongest performance, with further improvements from small-scale manual correction of training examples.
pdf
bib
abs
Toward Reasonable Parrots: Why Large Language Models Should Argue with Us by Design
Elena Musi
|
Nadin Kökciyan
|
Khalid Al Khatib
|
Davide Ceolin
|
Emmanuelle Dietz
|
Klara Maximiliane Gutekunst
|
Annette Hautli-Janisz
|
Cristián Santibáñez
|
Jodi Schneider
|
Jonas Scholz
|
Cor Steging
|
Jacky Visser
|
Henning Wachsmuth
Proceedings of the 12th Argument mining Workshop
In this position paper, we advocate for the development of conversational technology that is inherently designed to support and facilitate argumentative processes. We argue that, at present, large language models (LLMs) are inadequate for this purpose, and we propose an ideal technology design aimed at enhancing argumentative skills. This involves re-framing LLMs as tools to exercise our critical thinking skills rather than replacing them. We introduce the concept of reasonable parrots that embody the fundamental principles of relevance, responsibility, and freedom, and that interact through argumentative dialogical moves. These principles and moves arise out of millennia of work in argumentation theory and should serve as the starting point for LLM-based technology that incorporates basic principles of argumentation.
pdf
bib
abs
Multi-Class versus Means-End: Assessing Classification Approaches for Argument Patterns
Maximilian Heinrich
|
Khalid Al Khatib
|
Benno Stein
Proceedings of the 12th Argument mining Workshop
In the study of argumentation, the schemes introduced by Walton et al. (2008) represent a significant advancement in understanding and analyzing the structure and function of arguments. Walton’s framework is particularly valuable for computational reasoning, as it facilitates the identification of argument patterns and the reconstruction of enthymemes. Despite its practical utility, automatically identifying these schemes remains a challenging problem. To aid human annotators, Visser et al. (2021) developed a decision tree for scheme classification. Building on this foundation, we propose a means-end approach to argument scheme classification that systematically leverages expert knowledge—encoded in a decision tree—to guide language models through a complex classification task. We assess the effectiveness of the means-end approach by conducting a comprehensive comparison with a standard multi-class approach across two datasets, applying both prompting and supervised learning methods to each approach. Our results indicate that the means-end approach, when combined with supervised learning, achieves scores only slightly lower than those of the multi-class classification approach. At the same time, the means-end approach enhances explainability by identifying the specific steps in the decision tree that pose the greatest challenges for each scheme—offering valuable insights for refining the overall means-end classification process.
pdf
bib
abs
Storytelling in Argumentative Discussions: Exploring the Use of Narratives in ChangeMyView
Sara Nabhani
|
Khalid Al Khatib
|
Federico Pianzola
|
Malvina Nissim
Proceedings of the 12th Argument mining Workshop
Psychological research has long suggested that storytelling can shape beliefs and behaviors by fostering emotional engagement and narrative transportation. However, it remains unclear whether these effects extend to online argumentative discourse. In this paper, we examine the role of narrative in real-world argumentation using discussions from the ChangeMyView subreddit. Leveraging an automatic story detection model, we analyze how narrative use varies across persuasive comments, user types, discussion outcomes, and the kinds of change being sought. While narrative appears more frequently in some contexts, it is not consistently linked to successful persuasion. Notably, highly persuasive users tend to use narrative less, and storytelling does not demonstrate increased effectiveness for any specific type of persuasive goals. These findings suggest that narrative may play a limited and context-dependent role in online discussions, highlighting the need for computational models of argumentation to account for rhetorical diversity.
pdf
bib
abs
Hybrid Intelligence for Logical Fallacy Detection
Mariia Kutepova
|
Khalid Al Khatib
Proceedings of the Fourth Workshop on Bridging Human-Computer Interaction and Natural Language Processing (HCI+NLP)
This study investigates the impact of Hybrid Intelligence (HI) on improving the detection of logical fallacies, addressing the pressing challenge of misinformation prevalent across communication platforms. Employing a between-subjects experimental design, the research compares the performance of two groups: one relying exclusively on human judgment and another supported by an AI assistant. Participants evaluated a series of statements, with the AI-assisted group utilizing a custom ChatGPT-based chatbot that provided real-time hints and clarifications. The findings reveal a significant improvement in fallacy detection with AI support, increasing from an F1-score of 0.76 in the human-only group to 0.90 in the AI-assisted group. Despite this enhancement, both groups struggled to accurately identify non-fallacious statements, highlighting the need to further refine how AI assistance is leveraged.
2024
pdf
bib
abs
GroningenAnnotatesGaza at the FIGNEWS 2024 Shared Task: Analyzing Bias in Conflict Narratives
Khalid Khatib
|
Sara Gemelli
|
Saskia Heisterborg
|
Pritha Majumdar
|
Gosse Minnema
|
Arianna Muti
|
Noa Solissa
Proceedings of the Second Arabic Natural Language Processing Conference
In this paper we report the development of our annotation methodology for the shared task FIGNEWS 2024. The objective of the shared task is to look into the layers of bias in how the war on Gaza is represented in media narrative. Our methodology follows the prescriptive paradigm, in which guidelines are detailed and refined through an iterative process in which edge cases are discussed and converged. Our IAA score (Krippendorff’s 𝛼) is 0.420, highlighting the challenging and subjective nature of the task. Our results show that 52% of posts were unbiased, 42% biased against Palestine, 5% biased against Israel, and 3% biased against both. 16% were unclear or not applicable.
pdf
bib
abs
Language is Scary when Over-Analyzed: Unpacking Implied Misogynistic Reasoning with Argumentation Theory-Driven Prompts
Arianna Muti
|
Federico Ruggeri
|
Khalid Al Khatib
|
Alberto Barrón-Cedeño
|
Tommaso Caselli
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
We propose misogyny detection as an Argumentative Reasoning task and we investigate the capacity of large language models (LLMs) to understand the implicit reasoning used to convey misogyny in both Italian and English. The central aim is to generate the missing reasoning link between a message and the implied meanings encoding the misogyny. Our study uses argumentation theory as a foundation to form a collection of prompts in both zero-shot and few-shot settings. These prompts integrate different techniques, including chain-of-thought reasoning and augmented knowledge. Our findings show that LLMs fall short on reasoning capabilities about misogynistic comments and that they mostly rely on their implicit knowledge derived from internalized common stereotypes about women to generate implied assumptions, rather than on inductive reasoning.
pdf
bib
abs
Improving Argument Effectiveness Across Ideologies using Instruction-tuned Large Language Models
Roxanne El Baff
|
Khalid Al Khatib
|
Milad Alshomary
|
Kai Konen
|
Benno Stein
|
Henning Wachsmuth
Findings of the Association for Computational Linguistics: EMNLP 2024
Different political ideologies (e.g., liberal and conservative Americans) hold different worldviews, which leads to opposing stances on different issues (e.g., gun control) and, thereby, fostering societal polarization. Arguments are a means of bringing the perspectives of people with different ideologies closer together, depending on how well they reach their audience. In this paper, we study how to computationally turn ineffective arguments into effective arguments for people with certain ideologies by using instruction-tuned large language models (LLMs), looking closely at style features. For development and evaluation, we collect ineffective arguments per ideology from debate.org, and we generate about 30k, which we rewrite using three LLM methods tailored to our task: zero-shot prompting, few-shot prompting, and LLM steering. Our experiments provide evidence that LLMs naturally improve argument effectiveness for liberals. Our LLM-based and human evaluation show a clear preference towards the rewritten arguments. Code and link to the data are available here: https://github.com/roxanneelbaff/emnlp2024-iesta.