Debiasing Multilingual LLMs in Cross-lingual Latent Space

Qiwei Peng, Guimin Hu, Yekun Chai, Anders Søgaard


Abstract
Debiasing techniques such as SentDebias aim to reduce bias in large language models (LLMs). Previous studies have evaluated their cross-lingual transferability by directly applying these methods to LLM representations, revealing their limited effectiveness across languages. In this work, we therefore propose to perform debiasing in a joint latent space rather than directly on LLM representations. We construct a well-aligned cross-lingual latent space using an autoencoder trained on parallel TED talk scripts. Our experiments with Aya-expanse and two debiasing techniques across four languages (English, French, German, Dutch) demonstrate that a) autoencoders effectively construct a well-aligned cross-lingual latent space, and b) applying debiasing techniques in the learned cross-lingual latent space significantly improves both the overall debiasing performance and cross-lingual transferability.
Anthology ID:
2025.emnlp-main.1149
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
22593–22604
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1149/
DOI:
Bibkey:
Cite (ACL):
Qiwei Peng, Guimin Hu, Yekun Chai, and Anders Søgaard. 2025. Debiasing Multilingual LLMs in Cross-lingual Latent Space. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 22593–22604, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Debiasing Multilingual LLMs in Cross-lingual Latent Space (Peng et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1149.pdf
Checklist:
 2025.emnlp-main.1149.checklist.pdf