Shuqi Zhu

2026

Large Language Models (LLMs) are exhibiting emergent human-like abilities and are envisioned as the tool for simulating an individual’s communication patterns, behaviors, and personality traits. However, current evaluations of LLM-based persona simulation remain limited: most rely on synthetic dialogues and lack fine-grained analysis of the capability for persona simulation. To address these limitations, we introduce TwinVoice, a comprehensive benchmark for assessing persona simulation across diverse real-world contexts. TwinVoice encompasses three dimensions: Social Persona (public social interactions), Interpersonal Persona (private dialogues), and Narrative Persona (role-based expression). It further decomposes the evaluation into six fundamental capabilities, including opinion consistency, memory recall, logical reasoning, lexical fidelity, persona tone, and syntactic style. Experimental results reveal that while advanced models achieve moderate accuracy in persona simulation, they still fall short of capabilities such as syntactic style and memory recall. Our data, code, and evaluation results are available.

2025

pdf bib abs

As Large Language Models (LLMs) demonstrate increasingly strong human-like capabilities, the need to align them with human values has become significant. Recent advanced techniques, such as prompt learning and reinforcement learning, are being employed to bring LLMs closer to aligning with human values. While these techniques address broad ethical and helpfulness concerns, they rarely consider simulating individualized human values. To bridge this gap, we propose SimVBG, a framework that simulates individual values based on individual backstories that reflect their past experience and demographic information. SimVBG transforms structured data on an individual to a backstory and utilizes a multi-module architecture inspired by the Cognitive–Affective Personality System to simulate individual value based on the backstories. We test SimVBG on a self-constructed benchmark derived from the World Values Survey and show that SimVBG improves top-1 accuracy by more than 10% over the retrieval-augmented generation method. Further analysis shows that performance increases as additional interaction user history becomes available, indicating that the model can refine its persona over time. Code, dataset, and complete experimental results are available at https://github.com/bangdedadi/SimVBG.

Co-authors

Minghao Guo 1

Songming He 1

Monika A. Jankowska 1

Weihang Su 1

Zhijing Wu 1

Yongfeng Zhang 1

Xi Zhu 1

Venues

EMNLP1
Findings1

Fix author