Representational Isomorphism and Alignment of Multilingual Large Language Models

Di Wu, Yibin Lei, Andrew Yates, Christof Monz


Abstract
In this paper, we investigate the capability of Large Language Models (LLMs) to represent texts in multilingual contexts. Our findings show that sentence representations derived from LLMs exhibit a high degree of isomorphism across languages.This existing isomorphism can facilitate representational alignments in zero-shot and few-shot settings.Specifically, by applying a contrastive objective at the representation level with only a small number of translation pairs (e.g., 100), we substantially improve models’ performance on Semantic Textual Similarity (STS) tasks across languages. This representation-level approach proves to be more efficient and effective for semantic alignment than continued pretraining or instruction tuning. Interestingly, we also observe substantial STS improvements within individual languages, even without a monolingual objective specifically designed for this purpose.
Anthology ID:
2024.findings-emnlp.823
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
14074–14085
Language:
URL:
https://preview.aclanthology.org/build-pipeline-with-new-library/2024.findings-emnlp.823/
DOI:
10.18653/v1/2024.findings-emnlp.823
Bibkey:
Cite (ACL):
Di Wu, Yibin Lei, Andrew Yates, and Christof Monz. 2024. Representational Isomorphism and Alignment of Multilingual Large Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 14074–14085, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Representational Isomorphism and Alignment of Multilingual Large Language Models (Wu et al., Findings 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/build-pipeline-with-new-library/2024.findings-emnlp.823.pdf