Eylon Caplan
2026
Splits! Flexible Sociocultural Linguistic Investigation at Scale
Eylon Caplan | Tania Chakraborty | Dan Goldwasser
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Eylon Caplan | Tania Chakraborty | Dan Goldwasser
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Variation in language use, shaped by speakers’ sociocultural background and specific context of use, offers a rich lens into cultural perspectives, values, and opinions. For example, Chinese students discuss *healthy eating* with words like *timing*, *regularity*, and *digestion*, whereas Americans use vocabulary like *balancing food groups* and *avoiding fat and sugar*, reflecting distinct cultural models of nutrition (Banna et al., 2016). The computational study of these Sociocultural Linguistic Phenomena (SLP) has traditionally been done in NLP via tailored analyses of specific groups or topics, requiring specialized data collection and experimental operationalization—a process not well-suited to quick hypothesis exploration and prototyping. To address this, we propose constructing a "sandbox" designed for systematic and flexible sociolinguistic research. Using our method, we construct a demographically/topically split Reddit dataset, **Splits!**, validated by self-identification and by replicating several known SLPs from existing literature. We showcase the sandbox’s utility with a scalable, two-stage process that filters large collections of *potential* SLPs (PSLPs) to surface the most promising candidates for deeper, qualitative investigation.
TAIGR: Towards Modeling Influencer Content on Social Media via Structured, Pragmatic Inference
Nishanth Sridhar Nakshatri | Eylon Caplan | Rajkumar Pujari | Dan Goldwasser
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Nishanth Sridhar Nakshatri | Eylon Caplan | Rajkumar Pujari | Dan Goldwasser
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Health influencers play a growing role in shaping public beliefs, yet their content is often conveyed through conversational narratives and rhetorical strategies rather than explicit factual claims. As a result, claim-centric verification methods struggle to capture the pragmatic meaning of influencer discourse. In this paper, we propose TAIGR (Takeaway Argumentation Inference with Grounded References), a structured framework designed to analyze influencer discourse, which operates in 3 stages: (1) identifying the core influencer recommendation–takeaway; (2) constructing an argumentation graph that captures influencer justification for the takeaway; (3) performing factor graph-based probabilistic inference to validate the takeaway. We evaluate TAIGR on a content validation task over influencer video transcripts on health, showing that accurate validation requires modeling the discourse’s pragmatic and argumentative structure rather than treating transcripts as flat collections of claims.
2025
VIBE: Can a VLM Read the Room?
Tania Chakraborty | Eylon Caplan | Dan Goldwasser
Findings of the Association for Computational Linguistics: EMNLP 2025
Tania Chakraborty | Eylon Caplan | Dan Goldwasser
Findings of the Association for Computational Linguistics: EMNLP 2025
Understanding human social behavior such as recognizing emotions and the social dynamics causing them is an important and challenging problem. While LLMs have made remarkable advances, they are limited to the textual domain and cannot account for the major role that non-verbal cues play in understanding social situations. Vision Language Models (VLMs) can potentially account for this gap, however their ability to make correct inferences over such social cues has received little attention. In this paper, we explore the capabilities of VLMs at social reasoning. We identify a previously overlooked limitation in VLMs: the Visual Social-Pragmatic Inference gap. To target this gap, we propose a new task for VLMs: Visual Social-Pragmatic Inference. We construct a high quality dataset to test the abilities of a VLM for this task and benchmark the performance of several VLMs on it.
ConceptCarve: Dynamic Realization of Evidence
Eylon Caplan | Dan Goldwasser
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Eylon Caplan | Dan Goldwasser
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Finding evidence for human opinion and behavior at scale is a challenging task, often requiring an understanding of sophisticated thought patterns among vast online communities found on social media. For example, studying how ‘gun ownership’ is related to the perception of ‘Freedom’, requires a retrieval system that can operate at scale over social media posts, while dealing with two key challenges: (1) identifying abstract concept instances, (2) which can be instantiated differently across different communities. To address these, we introduce ConceptCarve, an evidence retrieval framework that utilizes traditional retrievers and LLMs to dynamically characterize the search space during retrieval. Our experiments show that ConceptCarve surpasses traditional retrieval systems in finding evidence within a social media community. It also produces an interpretable representation of the evidence for that community, which we use to qualitatively analyze complex thought patterns that manifest differently across the communities.