Cross-model Transferability among Large Language Models on the Platonic Representations of Concepts

Youcheng Huang, Chen Huang, Duanyu Feng, Wenqiang Lei, Jiancheng Lv


Abstract
Understanding the inner workings of Large Language Models (LLMs) is a critical research frontier. Prior research has shown that a single LLM’s concept representations can be captured as steering vectors (SVs), enabling the control of LLM behavior (e.g., towards generating harmful content). Our work takes a novel approach by exploring the intricate relationships between concept representations across different LLMs, drawing an intriguing parallel to Plato’s Allegory of the Cave. In particular, we introduce a linear transformation method to bridge these representations and present three key findings: 1) Concept representations across different LLMs can be effectively aligned using simple linear transformations, enabling efficient cross-model transfer and behavioral control via SVs. 2) This linear transformation generalizes across concepts, facilitating alignment and control of SVs representing different concepts across LLMs. 3) A weak-to-strong transferability exists between LLM concept representations, whereby SVs extracted from smaller LLMs can effectively control the behavior of larger LLMs. Our code is provided in the supplementary file and will be openly released.
Anthology ID:
2025.acl-long.185
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3686–3704
Language:
URL:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.185/
DOI:
Bibkey:
Cite (ACL):
Youcheng Huang, Chen Huang, Duanyu Feng, Wenqiang Lei, and Jiancheng Lv. 2025. Cross-model Transferability among Large Language Models on the Platonic Representations of Concepts. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3686–3704, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Cross-model Transferability among Large Language Models on the Platonic Representations of Concepts (Huang et al., ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.185.pdf