Shangke Lyu
2026
ToM-Synth: Scaling Robust Theory of Mind in LLMs via 6,912 Structured Social Units
Guiyang Hou | Xiang Huang | Shangke Lyu | Yuchuan Wu | Weiyao Luo | Xinyu Mei | Yongliang Shen | Weiming Lu | Yongbin Li
Findings of the Association for Computational Linguistics: ACL 2026
Guiyang Hou | Xiang Huang | Shangke Lyu | Yuchuan Wu | Weiyao Luo | Xinyu Mei | Yongliang Shen | Weiming Lu | Yongbin Li
Findings of the Association for Computational Linguistics: ACL 2026
Theory of Mind (ToM), the ability to infer others’ mental states from behavior, is pivotal for developing machines with human-level social intelligence. Existing methods endowing LLMs with ToM fall into two paradigms: training-free methods and those repurposing ToM evaluation benchmarks as training data for RL-based fine-tuning. However, training-free methods fail to internalize the augmented ToM into the LLMs. Meanwhile, using evaluation benchmarks as training sources is conceptually problematic and, in practice, results in narrow in-domain overfitting rather than robust ToM. To address the lack of training resources within the ToM community and to empower LLMs with robust ToM, we introduce ToM-Synth, a factorial combinatorial synthesis framework of 6912 social units. This framework enables the systematic synthesis of ToM data, yielding a training dataset of 27,648 instances, termed ToM-Synth-27K. Utilizing ToM-Synth-27K for RL fine-tuning, experimental results demonstrate consistent and significant improvements across models of varying families and scales on ToM, Emotional Intelligence, and Social Commonsense benchmarks. Furthermore, we observe concurrent enhancements in IQ-related tasks (math, science, logic) and effective performance scaling with increasing data scale.