A Multi-Level Benchmark for Causal Language Understanding in Social Media Discourse

Xiaohan Ding; Kaike Ping; Buse Çarık; Eugenia Rho

A Multi-Level Benchmark for Causal Language Understanding in Social Media Discourse

Xiaohan Ding, Kaike Ping, Buse Çarık, Eugenia Rho

Abstract

Understanding causal language in informal discourse is a core yet underexplored challenge in NLP. Existing datasets largely focus on explicit causality in structured text, providing limited support for detecting implicit causal expressions, particularly those found in informal, user-generated social media posts. We introduce CausalTalk, a multi-level dataset of five years of Reddit posts (2020–2024) discussing public health related to the COVID-19 pandemic, among which 10,120 posts are annotated across four causal tasks: (1) binary causal classification, (2) explicit vs. implicit causality, (3) cause–effect span extraction, and (4) causal gist generation. Annotations comprise both gold-standard labels created by domain experts and silver-standard labels generated by GPT-4o and verified by human annotators.CausalTalk bridges fine-grained causal detection and gist-based reasoning over informal text. It enables benchmarking across both discriminative and generative models, and provides a rich resource for studying causal reasoning in social media contexts.

Anthology ID:: 2025.emnlp-main.1464
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 28764–28778
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1464/
DOI:
Bibkey:
Cite (ACL):: Xiaohan Ding, Kaike Ping, Buse Çarık, and Eugenia Rho. 2025. A Multi-Level Benchmark for Causal Language Understanding in Social Media Discourse. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 28764–28778, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: A Multi-Level Benchmark for Causal Language Understanding in Social Media Discourse (Ding et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1464.pdf
Checklist:: 2025.emnlp-main.1464.checklist.pdf

PDF Cite Search Checklist Fix data