2025
pdf
bib
abs
SHARP: Unlocking Interactive Hallucination via Stance Transfer in Role-Playing LLMs
Chuyi Kong
|
Ziyang Luo
|
Hongzhan Lin
|
Zhiyuan Fan
|
Yaxin Fan
|
Yuxi Sun
|
Jing Ma
Findings of the Association for Computational Linguistics: ACL 2025
The advanced role-playing capabilities of Large Language Models (LLMs) have enabled rich interactive scenarios, yet existing research in social interactions neglects hallucination while struggling with poor generalizability and implicit character fidelity judgments. To bridge this gap, motivated by human behaviour, we introduce a generalizable and explicit paradigm for uncovering interactive patterns of LLMs across diverse worldviews. Specifically, we first define interactive hallucination through stance transfer, then construct SHARP, a benchmark built by extracting relations from commonsense knowledge graphs and utilizing LLMs’ inherent hallucination properties to simulate multi-role interactions. Extensive experiments confirm our paradigm’s effectiveness and stability, examine the factors that influence these metrics, and challenge conventional hallucination mitigation solutions. More broadly, our work reveals a fundamental limitation in popular post-training methods for role-playing LLMs: the tendency to obscure knowledge beneath style, resulting in monotonous yet human-like behaviors—interactive hallucination.
pdf
bib
abs
CausalAbstain: Enhancing Multilingual LLMs with Causal Reasoning for Trustworthy Abstention
Yuxi Sun
|
Aoqi Zuo
|
Wei Gao
|
Jing Ma
Findings of the Association for Computational Linguistics: ACL 2025
Large Language Models (LLMs) often exhibit knowledge disparities across languages. Encouraging LLMs to abstain when faced with knowledge gaps is a promising strategy to reduce hallucinations in multilingual settings. Current abstention strategies for multilingual scenarios primarily rely on generating feedback in various languages using LLMs and performing self-reflection. However, these methods can be adversely impacted by inaccuracies and biases in the generated feedback. To address this, from a causal perspective, we introduce CausalAbstain, a method that helps LLMs determine whether to utilize multiple generated feedback responses and how to identify the most useful ones. Extensive experiments demonstrate that CausalAbstain effectively selects helpful feedback and enhances abstention decisions with interpretability in both native language (Casual-native) and multilingual (Causal-multi) settings, outperforming strong baselines on two benchmark datasets covering encyclopedic and commonsense knowledge QA tasks.
pdf
bib
abs
Explainable Ethical Assessment on Human Behaviors by Generating Conflicting Social Norms
Yuxi Sun
|
Wei Gao
|
Hongzhan Lin
|
Jing Ma
|
Wenxuan Zhang
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
Human behaviors are often guided or constrained by social norms, which are defined as shared, commonsense rules. For example, underlying an action report a witnessed crime are social norms that inform our conduct, such as It is expected to be brave to report crimes. Current AI systems that assess valence (i.e., support or oppose) of human actions by leveraging large-scale data training not grounded on explicit norms may be difficult to explain, and thus untrustworthy. Emulating human assessors by considering social norms can help AI models better understand and predict valence. While multiple norms come into play, conflicting norms can create tension and directly influence human behavior. For example, when deciding whether to report a witnessed crime, one may balance bravery against self-protection. In this paper, we introduce ClarityEthic, a novel ethical assessment approach, to enhance valence prediction and explanation by generating conflicting social norms behind human actions, which strengthens the moral reasoning capabilities of language models by using a contrastive learning strategy. Extensive experiments demonstrate that our method outperforms strong baseline approaches, and human evaluations confirm that the generated social norms provide plausible explanations for the assessment of human behaviors.