Xiaohan Ding

2025

pdf bib abs
A Multi-Level Benchmark for Causal Language Understanding in Social Media Discourse
Xiaohan Ding | Kaike Ping | Buse Çarık | Eugenia Rho
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Understanding causal language in informal discourse is a core yet underexplored challenge in NLP. Existing datasets largely focus on explicit causality in structured text, providing limited support for detecting implicit causal expressions, particularly those found in informal, user-generated social media posts. We introduce CausalTalk, a multi-level dataset of five years of Reddit posts (2020–2024) discussing public health related to the COVID-19 pandemic, among which 10,120 posts are annotated across four causal tasks: (1) binary causal classification, (2) explicit vs. implicit causality, (3) cause–effect span extraction, and (4) causal gist generation. Annotations comprise both gold-standard labels created by domain experts and silver-standard labels generated by GPT-4o and verified by human annotators.CausalTalk bridges fine-grained causal detection and gist-based reasoning over informal text. It enables benchmarking across both discriminative and generative models, and provides a rich resource for studying causal reasoning in social media contexts.

Co-authors

Venues

emnlp1

Fix data

Xiaohan Ding

Fixing paper assignments

2025

Co-authors

Venues