Cohesion-6K: An Arabic Dataset for Analyzing Social Cohesion and Conflict in Online Discourse

Aisha Ali Al-Athba, Wajdi Zaghouani


Abstract
The study of online discourse has become central to understanding societal polarization. While much research has focused on detecting overt toxicity, the subtle dynamics of social cohesion, meaning the interaction between divisive and unifying narratives, remain computationally underexplored. This paper presents Cohesion-6K, a manually and ChatGPT-assisted annotated dataset of six thousand Arabic public Facebook posts related to the Israeli Occupation of Palestine. Each post is assigned to one of five discourse categories that represent a continuum from conflict to cohesion: Conflict, Resolution, Community Engagement, Supportive Interactions, and Shared Values. The annotation process combines expert human judgment with model-assisted pre-labeling verified by trained annotators, achieving substantial inter-annotator agreement (Cohen’s kappa = 0.85). Quantitative analysis reveals a consistent engagement gap, where conflict-oriented posts receive between two and four times more user interaction than resolution-oriented ones (p < 0.01). This pattern illustrates how divisive discourse tends to attract disproportionate visibility in Arabic social media spaces. Cohesion-6K provides a transparent and reproducible resource for the study of online cohesion and polarization. The dataset, annotation guidelines, and preprocessing code will be released for research use under an open license, supporting future work in computational social science, digital communication, and Arabic natural language processing.
Anthology ID:
2026.lrec-main.67
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
881–892
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.67/
DOI:
Bibkey:
Cite (ACL):
Aisha Ali Al-Athba and Wajdi Zaghouani. 2026. Cohesion-6K: An Arabic Dataset for Analyzing Social Cohesion and Conflict in Online Discourse. International Conference on Language Resources and Evaluation, main:881–892.
Cite (Informal):
Cohesion-6K: An Arabic Dataset for Analyzing Social Cohesion and Conflict in Online Discourse (Al-Athba & Zaghouani, LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.67.pdf