Mechanistic Understanding and Mitigation of Language Confusion in English-Centric Large Language Models

Ercong Nie; Helmut Schmid; Hinrich Schütze

doi:10.18653/v1/2025.findings-emnlp.37

Mechanistic Understanding and Mitigation of Language Confusion in English-Centric Large Language Models

Ercong Nie, Helmut Schmid, Hinrich Schuetze

Abstract

Language confusion—where large language models (LLMs) generate unintended languages against the user’s need—remains a critical challenge, especially for English-centric models. We present the first mechanistic interpretability (MI) study of language confusion, combining behavioral benchmarking with neuron-level analysis. Using the Language Confusion Benchmark (LCB), we show that confusion points (CPs)—specific positions where language switches occur—are central to this phenomenon. Through layer-wise analysis with TunedLens and targeted neuron attribution, we reveal that transition failures in the final layers drive confusion. We further demonstrate that editing a small set of critical neurons, identified via comparative analysis with a multilingual-tuned counterpart, substantially mitigates confusion while largely preserving general competence and fluency. Our approach matches multilingual alignment in confusion reduction for many languages and yields cleaner, higher-quality outputs. These findings provide new insights into the internal dynamics of LLMs and highlight neuron-level interventions as a promising direction for robust, interpretable multilingual language modeling.

Anthology ID:: 2025.findings-emnlp.37
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 690–706
Language:
URL:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.37/
DOI:: 10.18653/v1/2025.findings-emnlp.37
Bibkey:
Cite (ACL):: Ercong Nie, Helmut Schmid, and Hinrich Schuetze. 2025. Mechanistic Understanding and Mitigation of Language Confusion in English-Centric Large Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 690–706, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Mechanistic Understanding and Mitigation of Language Confusion in English-Centric Large Language Models (Nie et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.37.pdf
Checklist:: 2025.findings-emnlp.37.checklist.pdf

PDF Cite Search Checklist Fix data