High-Dimensional Interlingual Representations of Large Language Models

Bryan Wilie, Samuel Cahyawijaya, Junxian He, Pascale Fung


Abstract
Large language models (LLMs) trained on massive multilingual datasets hint at the formation of interlingual constructs–a shared region in the representation space. However, evidence regarding this phenomenon is mixed, leaving it unclear whether these models truly develop unified interlingual representations, or present a partially aligned constructs. We explore 31 diverse languages varying on their resource-levels, typologies, and geographical regions; and find that multilingual LLMs exhibit inconsistent cross-lingual alignments. To address this, we propose an interlingual representation framework identifying both the shared interlingual semantic region and fragmented components, existed due to representational limitations. We introduce Interlingual Local Overlap (ILO) score to quantify interlingual alignment by comparing the local neighborhood structures of high-dimensional representations. We utilize ILO to investigate the impact of single-language fine-tuning on the interlingual alignment in multilingual LLMs. Our results indicate that training exclusively on a single language disrupts the alignment in early layers, while freezing these layers preserves the alignment of interlingual representations, leading to improved cross-lingual generalization. These results validate our framework and metric for evaluating interlingual representation, and further underscore that interlingual alignment is crucial for scalable multilingual learning.
Anthology ID:
2025.sigtyp-1.14
Volume:
Proceedings of the 7th Workshop on Research in Computational Linguistic Typology and Multilingual NLP
Month:
August
Year:
2025
Address:
Vinenna. Austria
Editors:
Michael Hahn, Priya Rani, Ritesh Kumar, Andreas Shcherbakov, Alexey Sorokin, Oleg Serikov, Ryan Cotterell, Ekaterina Vylomova
Venues:
SIGTYP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
122–155
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.sigtyp-1.14/
DOI:
Bibkey:
Cite (ACL):
Bryan Wilie, Samuel Cahyawijaya, Junxian He, and Pascale Fung. 2025. High-Dimensional Interlingual Representations of Large Language Models. In Proceedings of the 7th Workshop on Research in Computational Linguistic Typology and Multilingual NLP, pages 122–155, Vinenna. Austria. Association for Computational Linguistics.
Cite (Informal):
High-Dimensional Interlingual Representations of Large Language Models (Wilie et al., SIGTYP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.sigtyp-1.14.pdf