AI Safety Lost in Translation: Evaluating the Effectiveness of English-Italian Cross-Lingual LLM Safety Alignment

Alessio Wu; Martim Brandao

AI Safety Lost in Translation: Evaluating the Effectiveness of English-Italian Cross-Lingual LLM Safety Alignment

Abstract

Large Language Models (LLMs) have been shown to be vulnerable to various issues of bias and safety, for which new safety alignment techniques have been proposed. In this paper, we investigate the degree to which such techniques improve safety in a non-English language, specifically in Italian, both when they have and don’t have access to safety training data in that language. We evaluate standard mitigation techniques and assess cross-lingual safety transfer by comparing English-only versus bilingual Supervised Fine-Tuning (SFT), on several open-source small LLMs: Qwen3, Llama3.2, and Gemma3. Results confirm a significant cross-lingual safety gap, with most models performing worse in Italian. We find that while prompt engineering is generally effective, the impact of SFT is highly inconsistent. English-only SFT occasionally failed to transfer safety improvements into Italian and even deteriorated the performance of some models. Furthermore, bilingual SFT repeatedly underperformed other mitigation methods. These findings demonstrate that safety alignment does not always generalize across languages and models, and standard mitigation strategies can lead to unpredictable effects. We thus highlight the critical necessity for language-specific evaluation and dedicated multilingual safety research to ensure AI is developed equitably and safely for a global audience.

Anthology ID:: 2026.lrec-main.296
Volume:: Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:: May
Year:: 2026
Address:: Palma de Mallorca, Spain
Editors:: Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:: LREC
SIG:
Publisher:: ELRA Language Resource Association
Note:
Pages:: 3697–3713
Language:
URL:: https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.296/
DOI:
Bibkey:
Cite (ACL):: Alessio Wu and Martim Brandao. 2026. AI Safety Lost in Translation: Evaluating the Effectiveness of English-Italian Cross-Lingual LLM Safety Alignment. International Conference on Language Resources and Evaluation, main:3697–3713.
Cite (Informal):: AI Safety Lost in Translation: Evaluating the Effectiveness of English-Italian Cross-Lingual LLM Safety Alignment (Wu & Brandao, LREC 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.296.pdf

PDF Cite Search Fix data