ActorMind: Emulating Human Actor Reasoning for Speech Role-Playing

Xi Chen, Wei Xue, Yike Guo


Abstract
Role-playing has garnered rising attention as it provides a strong foundation for human-machine interaction and facilitates sociological research. However, current work is confined to textual modalities, neglecting speech, which plays a predominant role in daily life, thus limiting genuine role-playing. To bridge this gap, we conceptualize and benchmark speech role-playing through ActorMindBench, and we present a corresponding reasoning framework, called ActorMind. Specifically, (1) Speech Role-Playing enables models to deliver spontaneous responses with personalized verbal traits based on their role, the scene, and spoken dialogue. (2) ActorMindBench is a hierarchical benchmark comprises Utterance-Level content with 7,653 utterances, Scene-Level content with 313 scenes, and Role-Level content with 6 roles. (3) ActorMind is an off-the-shelf, multi-agent, chain-of-though style reasoning framework that emulates how human actors perform in theaters. Concretely, ActorMind first reads its assigned role description via Eye Agent, then comprehends emotional cues within contextual spoken dialogues through Ear Agent. Subsequently, Brain Agent generates a descriptive emotional state, and finally, Mouth Agent delivers the scripts infused with corresponding emotion state. Experimental results demonstrate the effectiveness of ActorMind in enhancing speech role-playing. The project page is available at https://github.com/OzymandiasChen/ActorMind.
Anthology ID:
2026.findings-acl.1718
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
34399–34413
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1718/
DOI:
Bibkey:
Cite (ACL):
Xi Chen, Wei Xue, and Yike Guo. 2026. ActorMind: Emulating Human Actor Reasoning for Speech Role-Playing. In Findings of the Association for Computational Linguistics: ACL 2026, pages 34399–34413, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
ActorMind: Emulating Human Actor Reasoning for Speech Role-Playing (Chen et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1718.pdf
Checklist:
 2026.findings-acl.1718.checklist.pdf