Aili Chen
2026
HER: Human-like Reasoning and Reinforcement Learning for LLM Role-playing
Chengyu Du | Xintao Wang | Aili Chen | Weiyuan Li | Rui Xu | Junteng Liu | Zishan Huang | Rong Tian | Zijun Sun | Yuhao Li | Liheng Feng | Deming Ding | Pengyu Zhao | Yanghua Xiao
Findings of the Association for Computational Linguistics: ACL 2026
Chengyu Du | Xintao Wang | Aili Chen | Weiyuan Li | Rui Xu | Junteng Liu | Zishan Huang | Rong Tian | Zijun Sun | Yuhao Li | Liheng Feng | Deming Ding | Pengyu Zhao | Yanghua Xiao
Findings of the Association for Computational Linguistics: ACL 2026
LLM role-playing, i.e., using large language models (LLMs) to simulate specific personas, has emerged as a key capability in various applications, such as companionship, content creation, and digital games. While current models effectively capture character tones and knowledge, simulating the inner thoughts behind their behaviors remains a non-trivial challenge. Towards cognitive simulation in LLM role-play, previous efforts have mainly suffered from two critical deficiencies: the lack of high-quality datasets with explicit reasoning traces and the absence of reliable reward signals aligned with human preferences. In this paper, we propose HER (Human Emulation Reasoning), a unified framework for cognitive-level persona simulation. HER introduces a dual-layer thinking mechanism that strictly distinguishes characters’ first-person thinking processes from LLMs’ third-person reasoning. To bridge the aforementioned gaps, we curate a reasoning-augmented role-playing dataset via a reverse engineering strategy for supervised learning, and construct human-aligned evaluation principles and preference-based reward models for role-play reinforcement learning. Leveraging these resources, we train HER models based on the Qwen3-32B backbone via a hybrid paradigm of supervised learning (SL) and reinforcement learning from human feedback (RLHF). Extensive experiments validate the effectiveness of our approach. Notably, our models significantly outperform the Qwen3-32B baseline, achieving a 30.26% on the CoSER benchmark and a 14.97% on the MiniMax Benchmark. Our datasets, evaluation principles, and trained models will be released to facilitate future research in cognitive-level LLM role-playing.
Can LLMs Learn to Map the World from Local Descriptions?
Sirui Xia | Aili Chen | Xintao Wang | Tinghui Zhu | Yikai Zhang | Jiangjie Chen | Yanghua Xiao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Sirui Xia | Aili Chen | Xintao Wang | Tinghui Zhu | Yikai Zhang | Jiangjie Chen | Yanghua Xiao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Recent advances in Large Language Models (LLMs) have demonstrated strong capabilities in tasks such as code generation and mathematical reasoning. However, their potential to internalize structured spatial knowledge remains underexplored. This study investigates whether LLMs, grounded in locally relative human observations, can construct coherent global spatial cognition by integrating fragmented relational descriptions. We focus on two core aspects of spatial cognition: spatial perception, where models infer consistent global layouts from local positional relationships, and spatial navigation, where models learn road connectivity from trajectory data and plan optimal paths between unconnected locations. Experiments conducted in a simulated urban environment demonstrate that LLMs not only generalize to unseen spatial relationships between points of interest (POIs) but also exhibit latent representations aligned with real-world spatial distributions. Furthermore, LLMs can learn road connectivity from trajectory descriptions, enabling accurate path planning and dynamic spatial awareness during navigation.
2025
SELFGOAL: Your Language Agents Already Know How to Achieve High-level Goals
Ruihan Yang | Jiangjie Chen | Yikai Zhang | Siyu Yuan | Aili Chen | Kyle Richardson | Yanghua Xiao | Deqing Yang
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Ruihan Yang | Jiangjie Chen | Yikai Zhang | Siyu Yuan | Aili Chen | Kyle Richardson | Yanghua Xiao | Deqing Yang
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Language agents powered by large language models (LLMs) are increasingly valuable as decision-making tools in domains such as gaming and programming. However, these agents often face challenges in achieving high-level goals without detailed instructions and in adapting to environments where feedback is delayed. In this paper, we present SELFGOAL, a novel automatic approach designed to enhance agents’ capabilities to achieve high-level goals with limited human prior and environmental feedback. The core concept of SELFGOAL involves adaptively breaking down a high-level goal into a tree structure of more practical subgoals during the interaction with environments while identifying the most useful subgoals and progressively updating this structure. Experimental results demonstrate that SELFGOAL significantly enhances the performance of language agents across various tasks, including competitive, cooperative, and deferred feedback environments.
DEEPER Insight into Your User: Directed Persona Refinement for Dynamic Persona Modeling
Aili Chen | Chengyu Du | Jiangjie Chen | Jinghan Xu | Yikai Zhang | Siyu Yuan | Zulong Chen | Liangyue Li | Yanghua Xiao
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Aili Chen | Chengyu Du | Jiangjie Chen | Jinghan Xu | Yikai Zhang | Siyu Yuan | Zulong Chen | Liangyue Li | Yanghua Xiao
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
To advance personalized applications such as recommendation systems and user behavior prediction, recent research increasingly adopts large language models (LLMs) for human-readable persona modeling. In dynamic real-world scenarios, effective persona modeling necessitates leveraging streaming behavior data to continually optimize user personas.However, existing methods—whether regenerating personas or incrementally extending them with new behaviors—often fail to achieve sustained improvements in persona quality or future behavior prediction accuracy. To address this, we propose DEEPER, a novel approach for dynamic persona modeling that enables continual persona optimization. Specifically, we enhance the model’s direction-search capability through an iterative reinforcement learning framework, allowing it to automatically identify effective update directions and optimize personas using discrepancies between user behaviors and model predictions.Extensive experiments on dynamic persona modeling involving 4,800 users across 10 domains highlight ’s superior persona optimization capabilities, delivering an impressive 32.2% average reduction in user behavior prediction error over four update rounds—outperforming the best baseline by a remarkable 22.92%.