Proceedings of the Third Workshop on Social Influence in Conversations (SICon 2025)

James Hale, Brian Deuksin Kwon, Ritam Dutt (Editors)

Anthology ID:: 2025.sicon-1
Month:: July
Year:: 2025
Address:: Vienna, Austria
Venues:: SICon | WS
SIG:
Publisher:: Association for Computational Linguistics
URL:: https://preview.aclanthology.org/acl25-workshop-ingestion/2025.sicon-1/
DOI:
ISBN:: 979-8-89176-266-4
Bib Export formats:: BibTeX
PDF:: https://preview.aclanthology.org/acl25-workshop-ingestion/2025.sicon-1.pdf

pdf bib
Proceedings of the Third Workshop on Social Influence in Conversations (SICon 2025)
James Hale | Brian Deuksin Kwon | Ritam Dutt

pdf bib abs
LLM Roleplay: Simulating Human-Chatbot Interaction
Hovhannes Tamoyan | Hendrik Schuff | Iryna Gurevych

The development of chatbots requires collecting a large number of human-chatbot dialogues to reflect the breadth of users’ sociodemographic backgrounds and conversational goals. However, the resource requirements to conduct the respective user studies can be prohibitively high and often only allow for a narrow analysis of specific dialogue goals and participant demographics. In this paper, we propose LLM Roleplay, the first comprehensive method integrating multi-turn human-chatbot interaction simulation, explicit persona construction from sociodemographic traits, goal-driven dialogue planning, and robust handling of conversational failures, enabling broad utility and reliable dialogue generation. To validate our method, we collect natural human-chatbot dialogues from different sociodemographic groups and conduct a user study to compare these with our generated dialogues. We evaluate the capabilities of state-of-the-art LLMs in maintaining a conversation during their embodiment of a specific persona and find that our method can simulate human-chatbot dialogues with a high indistinguishability rate.

pdf bib abs
Prompt Refinement or Fine-tuning? Best Practices for using LLMs in Computational Social Science Tasks
Anders Giovanni Møller | Luca Maria Aiello

Large Language Models are expressive tools that enable complex tasks of text understanding within Computational Social Science. Their versatility, while beneficial, poses a barrier for establishing standardized best practices within the field. To bring clarity on the values of different strategies, we present an overview of the performance of modern LLM-based classification methods on a benchmark of 23 social knowledge tasks. Our results point to three best practices: prioritize models with larger vocabulary and pre-training corpora; avoid simple zero-shot in favor of AI-enhanced prompting; fine-tune on task-specific data, and consider more complex forms instruction-tuning on multiple datasets only when only training data is more abundant.

Deception detection is crucial in domains such as security, forensics, and legal proceedings, as well as to ensure the reliability of AI systems. However, current approaches are limited by the lack of generalizable and interpretable benchmarks built on large and diverse datasets. To address this gap, we introduce DecepBench, a comprehensive and robust benchmark for multimodal deception detection. DecepBench includes an enhanced version of the DOLOS dataset, the largest game-show deception dataset (1,700 labeled video clips with audio). We augment each video clip with transcripts, introducing a third modality (text) and incorporating deception-related features identified in psychological research. We employ explainable methods to evaluate the relevance of key deception cues, providing insights into model limitations and guiding future improvements. Our enhancements to DOLOS, combined with these interpretable analyses, yield improved performance and a deeper understanding of multimodal deception detection.

pdf bib abs
Should I go vegan: Evaluating the Persuasiveness of LLMs in Persona-Grounded Dialogues
Shruthi Chockkalingam | Seyed Hossein Alavi | Raymond T. Ng | Vered Shwartz

As the use of large language models becomes ever more prevalent, understanding their persuasive abilities, both in ways that can be beneficial and harmful to humans, proves an important task. Previous work has focused on persuasion in the context of negotiations, political debate and advertising. We instead shift the focus to a more realistic setup of a dialogue between a persuadee with an everyday dilemma (e.g., whether to switch to a vegan diet or not) and a persuader with no prior knowledge about the persuadee who is trying to persuade them towards a certain decision based on arguments they feel would be most suited to the persuadee’s persona. We collect and analyze conversations between a human persuadee and either a human persuader or an LLM persuader based on GPT-4. We find that, in this setting, GPT-4 is perceived as both more persuasive and more empathetic, whereas humans are more skilled at discovering new information about the person they are speaking to. This research provides the groundwork for future work predicting the persuasiveness of utterances in conversation across a range of topics.

pdf bib abs
PROTECT: Policy-Related Organizational Value Taxonomy for Ethical Compliance and Trust
Avni Mittal | Sree Hari Nagaralu | Sandipan Dandapat

This paper presents PROTECT, a novel policy-driven organizational value taxonomy designed to enhance ethical compliance and trust within organizations. Drawing on established human value systems and leveraging large language models, PROTECT generates values tailored to organizational contexts and clusters them into a refined taxonomy. This taxonomy serves as the basis for creating a comprehensive dataset of compliance scenarios, each linked to specific values and paired with both compliant and non-compliant responses. By systematically varying value emphasis, we illustrate how different LLM personas emerge, reflecting diverse compliance behaviors. The dataset, directly grounded in the taxonomy, enables consistent evaluation and training of LLMs on value-sensitive tasks. While PROTECT offers a robust foundation for aligning AI systems with organizational standards, our experiments also reveal current limitations in model accuracy, highlighting the need for further improvements. Together, the taxonomy and dataset represent complementary, foundational contributions toward value-aligned AI in organizational settings.

pdf bib abs
Too Polite to be Human: Evaluating LLM Empathy in Korean Conversations via a DCT-Based Framework
Seoyoon Park | Jaehee Kim | Hansaem Kim

As LLMs are increasingly used in global conversational settings, concerns remain about their ability to handle complex sociocultural contexts. This study evaluates LLMs’ empathetic understanding in Korean—a high-context language—using a pragmatics-based Discourse Completion Task (DCT) focused on interpretive judgment rather than generation. We constructed a dataset varying relational hierarchy, intimacy, and emotional valence, and compared responses from proprietary and open-source LLMs to those of Korean speakers. Most LLMs showed over-empathizing tendencies and struggled with ambiguous relational cues. Neither model size nor Korean fine-tuning significantly improved performance. While humans reflected relational nuance and contextual awareness, LLMs relied on surface strategies. These findings underscore LLMs’ limits in socio-pragmatic reasoning and introduce a scalable, culturally flexible framework for evaluating socially-aware AI.

pdf bib abs
Masculine Defaults via Gendered Discourse in Podcasts and Large Language Models
Maria Teleki | Xiangjue Dong | Haoran Liu | James Caverlee

Masculine discourse words are discourse terms that are both socially normative and statistically associated with male speakers. We propose a twofold framework for (i) the large-scale discovery and analysis of gendered discourse words in spoken content via our Gendered Discourse Correlation Framework; and (ii) the measurement of the gender bias associated with these words in LLMs via our Discourse Word-Embedding Association Test. We focus our study on podcasts, a popular and growing form of social media, analyzing 15,117 podcast episodes. We analyze correlations between gender and discourse words – discovered via LDA and BERTopic. We then find that gendered discourse-based masculine defaults exist in the domains of business, technology/politics, and video games, indicating that these gendered discourse words are socially influential. Next, we study the representation of these words from a state-of-the-art LLM embedding model from OpenAI, and find that the masculine discourse words have a more stable and robust representation than the feminine discourse words, which may result in better system performance on downstream tasks for men. Hence, men are rewarded for their discourse patterns with better system performance – and this embedding disparity constitutes a representational harm and a masculine default.

pdf bib abs
Unmasking the Strategists: An Intent-Driven Multi-Agent Framework for Analyzing Manipulation in Courtroom Dialogues
Disha Sheshanarayana | Tanishka Magar | Ayushi Mittal | Neelam Chaplot

Courtrooms are places where lives are determined and fates are sealed, yet they are not impervious to manipulation. Strategic use of manipulation in legal jargon can sway the opinions of judges and affect the decisions. Despite the growing advancements in NLP, its application in detecting and analyzing manipulation within the legal domain remains largely unexplored. Our work addresses this gap by introducing LegalCon, a dataset of 1,063 annotated courtroom conversations labeled for manipulation detection, identification of primary manipulators, and classification of manipulative techniques, with a focus on long conversations. Furthermore, we propose CLAIM, a two-stage, Intent-driven Multi-agent framework designed to enhance manipulation analysis by enabling context-aware and informed decision-making. Our results highlight the potential of incorporating agentic frameworks to improve fairness and transparency in judicial processes. We hope that this contributes to the broader application of NLP in legal discourse analysis and the development of robust tools to support fairness in legal decision-making. Our code and data are available at CLAIM.

pdf bib abs
Steering Conversational Large Language Models for Long Emotional Support Conversations
Navid Madani | Rohini Srihari

In this study, we address the challenge of consistently following emotional support strategies in long conversations by large language models (LLMs). We introduce the Strategy-Relevant Attention (SRA) metric, a model-agnostic measure designed to evaluate the effectiveness of LLMs in adhering to strategic prompts in emotional support contexts. By analyzing conversations within the Emotional Support Conversations dataset (ESConv) using LLaMA models, we demonstrate that SRA is significantly correlated with a model’s ability to sustain the outlined strategy throughout the interactions. Our findings reveal that the application of SRA-informed prompts leads to enhanced strategic adherence, resulting in conversations that more reliably exhibit the desired emotional support strategies over longer conversations. Furthermore, we contribute a comprehensive, multi-branch synthetic conversation dataset for ESConv, featuring a variety of strategy continuations informed by our optimized prompting method. The code and data are publicly available on our Github.

pdf bib abs
Text Overlap: An LLM with Human-like Conversational Behaviors
JiWoo Kim | Minsuk Chang | JinYeong Bak

Traditional text-based human-AI interactions typically follow a strict turn-taking approach. This rigid structure limits conversational flow, unlike natural human conversations, which can freely incorporate overlapping speech. However, our pilot study suggests that even in text-based interfaces, overlapping behaviors such as backchanneling and proactive responses lead to more natural and functional exchanges. Motivated by these findings, we introduce text-based overlapping interactions as a new challenge in human-AI communication, characterized by real-time typing, diverse response types, and interruptions. To enable AI systems to handle such interactions, we define three core tasks: deciding when to overlap, selecting the response type, and generating utterances. We construct a synthetic dataset for these tasks and train OverlapBot, an LLM-driven chatbot designed to engage in text-based overlapping interactions. Quantitative and qualitative evaluations show that OverlapBot increases turn exchanges compared to traditional turn-taking systems, with users making 72% more turns and the chatbot 130% more turns, which is perceived as efficient by end-users. This finding supports overlapping interactions and enhances communicative efficiency and engagement.

pdf bib abs
Social Influence in Consumer Response to Advertising: A Model of Conversational Engagement
Javier Marín

This paper explores social influence in consumer responses to advertising through investment-mediated conversational dynamics. We implement conversational engagement via advertising expenditure patterns, recognizing that marketing spend directly translates into conversational volume and reach across multi-channel ecosystems. Our approach integrates social psychology frameworks with statistical physics analogies as epistemic scaffolding following Ruse’s änalogy as heuristic” idea. The model introduces three parameters—Marketing Sensitivity, Response Sensitivity, and Behavioral Sensitivity—quantifying emergent properties of investment-driven influence networks. Validation against three real-world datasets shows competitive performance compared to conventional approaches of modeling the consumer response curve like Michaelis-Menten and Hill equations, with context-dependent advantages in network-driven scenarios. These findings illustrate how advertising ecosystems operate as complex adaptive systems (CAS) where influence propagates through investment-amplified conversational networks.

pdf bib abs
Extended Abstract: Probing-Guided Parameter-Efficient Fine-Tuning for Balancing Linguistic Adaptation and Safety in LLM-based Social Influence Systems
Manyana Tiwari

Designing effective LLMs for social influence (SI) tasks demands controlling linguistic output such that it adapts to context (such as user attributes, history etc.) while upholding ethical guardrails. Standard Parameter-Efficient Fine-Tuning (PEFT) methods like LoRA struggle to manage the trade-off between adaptive linguistic expression and safety and they optimize based on overall objectives without differentiating the functional roles of internal model components. Therefore, we introduce Probing-Guided PEFT (PG-PEFT), a novel fine-tuning strategy which utilizes interpretability probes to identify LLM components associated with context-driven linguistic variations versus those linked to safety violations (e.g., toxicity, bias). This functional map then guides LoRA updates, enabling more targeted control over the model’s linguistic output. We evaluate PG-PEFT on SI tasks (persuasion, negotiation) and linguistic adaptability with safety benchmarks against standard PEFT.