Jinzhe Tu
2026
LASA: Language-Agnostic Semantic Alignment at the Semantic Bottleneck for LLM Safety
Junxiao Yang | Haoran Liu | Jinzhe Tu | Jiale Cheng | Zhexin Zhang | Shiyao Cui | Jiaqi Weng | Jialing Tao | Hui Xue | Hongning Wang | Han Qiu | Minlie Huang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Junxiao Yang | Haoran Liu | Jinzhe Tu | Jiale Cheng | Zhexin Zhang | Shiyao Cui | Jiaqi Weng | Jialing Tao | Hui Xue | Hongning Wang | Han Qiu | Minlie Huang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large language models (LLMs) have demonstrated better safety performance in high-resource languages than in low-resource languages. We attribute this issue as a mismatch gap between language-agnostic semantic understanding ability and language dominant safety alignment biased toward high-resource languages. Based on above insights, we empirically identify the semantic bottleneck in LLMs: intermediate layers in which the geometry of model representations is governed primarily by shared semantic content rather than language identity. Then, we propose Language-Agnostic Semantic Alignment (LASA), which anchors safety alignment directly in semantic bottlenecks. Experiments show that LASA substantially improves safety across all languages: average attack success rate (ASR) drops from 24.7% to 2.8% on LLaMA-3.1-8B-Instruct and remains within 3–4% across Qwen2.5 and Qwen3 Instruct models (7B–32B). Besides, our analysis and method offer a representation-level perspective on LLM safety, suggesting that safety alignment requires anchoring safety understanding not in surface text, but in the model’s language-agnostic semantic space.