When Meanings Meet: Investigating the Emergence and Quality of Shared Concept Spaces during Multilingual Language Model Training

Felicia Körner, Max Müller-Eberstein, Anna Korhonen, Barbara Plank


Abstract
Training Large Language Models (LLMs) with high multilingual coverage is becoming increasingly important — especially when monolingual resources are scarce. Recent studies have found that LLMs process multilingual inputs in shared concept spaces, thought to support generalization and cross-lingual transfer. However, these prior studies often do not use causal methods, lack deeper error analysis or focus on the final model only, leaving open how these spaces emerge *during training*. We investigate the development of language-agnostic concept spaces during pretraining of EuroLLM through the causal interpretability method of activation patching. We isolate cross-lingual concept representations, then inject them into a translation prompt to investigate how consistently translations can be altered, independently of the language. We find that *shared concept spaces emerge early and continue to refine*, but that *alignment with them is language-dependent*. Furthermore, in contrast to prior work, our fine-grained manual analysis reveals that some apparent gains in translation quality reflect shifts in behavior — like selecting senses for polysemous words or translating instead of copying cross-lingual homographs — rather than improved translation ability. Our findings offer new insight into the training dynamics of cross-lingual alignment and the conditions under which causal interpretability methods offer meaningful insights in multilingual contexts.
Anthology ID:
2026.eacl-long.145
Volume:
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3149–3169
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-long.145/
DOI:
Bibkey:
Cite (ACL):
Felicia Körner, Max Müller-Eberstein, Anna Korhonen, and Barbara Plank. 2026. When Meanings Meet: Investigating the Emergence and Quality of Shared Concept Spaces during Multilingual Language Model Training. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3149–3169, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
When Meanings Meet: Investigating the Emergence and Quality of Shared Concept Spaces during Multilingual Language Model Training (Körner et al., EACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-long.145.pdf