Language Dominance in Multilingual Large Language Models

Nadav Shani; Ali Basirat

Language Dominance in Multilingual Large Language Models

Abstract

This paper investigates the language dominance hypothesis in multilingual large language models (LLMs), which posits that cross-lingual understanding is facilitated by an implicit translation into a dominant language seen more frequently during pretraining. We propose a novel approach to quantify how languages influence one another in a language model. By analyzing the hidden states across intermediate layers of language models, we model interactions between language-specific embedding spaces using Gaussian Mixture Models. Our results reveal only weak signs of language dominance in middle layers, affecting only a fraction of tokens. Our findings suggest that multilingual processing in LLMs is better explained by language-specific and shared representational spaces rather than internal translation into a single dominant language.

Anthology ID:: 2025.blackboxnlp-1.7
Volume:: Proceedings of the 8th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Yonatan Belinkov, Aaron Mueller, Najoung Kim, Hosein Mohebbi, Hanjie Chen, Dana Arad, Gabriele Sarti
Venues:: BlackboxNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 137–148
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.blackboxnlp-1.7/
DOI:
Bibkey:
Cite (ACL):: Nadav Shani and Ali Basirat. 2025. Language Dominance in Multilingual Large Language Models. In Proceedings of the 8th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP, pages 137–148, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Language Dominance in Multilingual Large Language Models (Shani & Basirat, BlackboxNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.blackboxnlp-1.7.pdf

PDF Cite Search Fix data