Soteria: Language-Specific Functional Parameter Steering for Multilingual Safety Alignment

Somnath Banerjee; Sayan Layek; Pratyush Chatterjee; Animesh Mukherjee; Rima Hazra

doi:10.18653/v1/2025.findings-emnlp.497

Soteria: Language-Specific Functional Parameter Steering for Multilingual Safety Alignment

Somnath Banerjee, Sayan Layek, Pratyush Chatterjee, Animesh Mukherjee, Rima Hazra

Abstract

Ensuring consistent safety across multiple languages remains a significant challenge for large language models (LLMs). We introduce Soteria, a lightweight yet powerful strategy that locates and minimally adjusts the “functional heads” most responsible for harmful content generation in each language. By altering only a fraction of parameters, Soteria drastically reduces policy violations without sacrificing overall model performance, even in low-resource settings. To rigorously evaluate our approach, we also present XThreatBench, a specialized multilingual dataset capturing fine-grained harmful behaviors drawn from real policy guidelines. Experiments with leading open-source LLMs (e.g., Llama, Qwen, Mistral) show that Soteria consistently improves safety metrics across high-, mid-, and low-resource languages. These findings highlight a promising path toward scalable, linguistically attuned, and ethically aligned LLMs worldwide.

Anthology ID:: 2025.findings-emnlp.497
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 9347–9364
Language:
URL:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.497/
DOI:: 10.18653/v1/2025.findings-emnlp.497
Bibkey:
Cite (ACL):: Somnath Banerjee, Sayan Layek, Pratyush Chatterjee, Animesh Mukherjee, and Rima Hazra. 2025. Soteria: Language-Specific Functional Parameter Steering for Multilingual Safety Alignment. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 9347–9364, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Soteria: Language-Specific Functional Parameter Steering for Multilingual Safety Alignment (Banerjee et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.497.pdf
Checklist:: 2025.findings-emnlp.497.checklist.pdf

PDF Cite Search Checklist Fix data