Testing Simulation Theory in LLMs’ Theory of Mind

Koshiro Aoki, Daisuke Kawahara


Abstract
Theory of Mind (ToM) is the ability to understand others’ mental states, which is essential for human social interaction. Although recent studies suggest that large language models (LLMs) exhibit human-level ToM capabilities, the underlying mechanisms remain unclear. “Simulation Theory” posits that we infer others’ mental states by simulating their cognitive processes, which has been widely discussed in cognitive science. In this work, we propose a framework for investigating whether the ToM mechanism in LLMs is based on Simulation Theory by analyzing their internal representations. Following this framework, we successfully steered LLMs’ ToM reasoning through modeled perspective-taking and counterfactual interventions. Our results suggest that Simulation Theory may partially explain the ToM mechanism in state-of-the-art LLMs, indicating parallels between human and artificial social reasoning.
Anthology ID:
2025.ijcnlp-srw.9
Volume:
The 14th International Joint Conference on Natural Language Processing and The 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
Month:
December
Year:
2025
Address:
Mumbai, India
Editors:
Santosh T.y.s.s, Shuichiro Shimizu, Yifan Gong
Venue:
IJCNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
96–104
Language:
URL:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.ijcnlp-srw.9/
DOI:
Bibkey:
Cite (ACL):
Koshiro Aoki and Daisuke Kawahara. 2025. Testing Simulation Theory in LLMs’ Theory of Mind. In The 14th International Joint Conference on Natural Language Processing and The 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, pages 96–104, Mumbai, India. Association for Computational Linguistics.
Cite (Informal):
Testing Simulation Theory in LLMs’ Theory of Mind (Aoki & Kawahara, IJCNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.ijcnlp-srw.9.pdf