Tania Chakraborty
2026
Splits! Flexible Sociocultural Linguistic Investigation at Scale
Eylon Caplan | Tania Chakraborty | Dan Goldwasser
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Eylon Caplan | Tania Chakraborty | Dan Goldwasser
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Variation in language use, shaped by speakers’ sociocultural background and specific context of use, offers a rich lens into cultural perspectives, values, and opinions. For example, Chinese students discuss *healthy eating* with words like *timing*, *regularity*, and *digestion*, whereas Americans use vocabulary like *balancing food groups* and *avoiding fat and sugar*, reflecting distinct cultural models of nutrition (Banna et al., 2016). The computational study of these Sociocultural Linguistic Phenomena (SLP) has traditionally been done in NLP via tailored analyses of specific groups or topics, requiring specialized data collection and experimental operationalization—a process not well-suited to quick hypothesis exploration and prototyping. To address this, we propose constructing a "sandbox" designed for systematic and flexible sociolinguistic research. Using our method, we construct a demographically/topically split Reddit dataset, **Splits!**, validated by self-identification and by replicating several known SLPs from existing literature. We showcase the sandbox’s utility with a scalable, two-stage process that filters large collections of *potential* SLPs (PSLPs) to surface the most promising candidates for deeper, qualitative investigation.
2025
VIBE: Can a VLM Read the Room?
Tania Chakraborty | Eylon Caplan | Dan Goldwasser
Findings of the Association for Computational Linguistics: EMNLP 2025
Tania Chakraborty | Eylon Caplan | Dan Goldwasser
Findings of the Association for Computational Linguistics: EMNLP 2025
Understanding human social behavior such as recognizing emotions and the social dynamics causing them is an important and challenging problem. While LLMs have made remarkable advances, they are limited to the textual domain and cannot account for the major role that non-verbal cues play in understanding social situations. Vision Language Models (VLMs) can potentially account for this gap, however their ability to make correct inferences over such social cues has received little attention. In this paper, we explore the capabilities of VLMs at social reasoning. We identify a previously overlooked limitation in VLMs: the Visual Social-Pragmatic Inference gap. To target this gap, we propose a new task for VLMs: Visual Social-Pragmatic Inference. We construct a high quality dataset to test the abilities of a VLM for this task and benchmark the performance of several VLMs on it.