CALM: Unleashing the Cross-Lingual Self-Aligning Ability of Language Model Question Answering

Yumeng Wang, Zhiyuan Fan, Qingyun Wang, Yi R. Fung, Heng Ji


Abstract
Large Language Models (LLMs) are pretrained on extensive multilingual corpora to acquire both language-specific cultural knowledge and general knowledge. Ideally, while LLMs should provide consistent responses to culture-independent questions across languages, we observe significant performance disparities. To address this, we explore the **C**ross-Lingual Self-**A**ligning ability of **L**anguage **M**odels (**CALM**) to align knowledge across languages. Specifically, for a given question, we sample multiple responses across different languages and select the most self-consistent response as the target, leaving the remaining responses as negative examples. We then employ direct preference optimization (DPO) to align the model’s knowledge across different languages. Evaluations on the MEDQA and X-CSQA datasets demonstrate CALM’s effectiveness in enhancing cross-lingual knowledge question answering, both in zero-shot and retrieval-augmented settings. We also found that increasing the number of languages involved in CALM training leads to higher accuracy and consistency. We offer a qualitative analysis of how cross-lingual consistency can enhance knowledge alignment and explore the method’s generalizability.
Anthology ID:
2025.findings-naacl.152
Volume:
Findings of the Association for Computational Linguistics: NAACL 2025
Month:
April
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2809–2817
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.findings-naacl.152/
DOI:
Bibkey:
Cite (ACL):
Yumeng Wang, Zhiyuan Fan, Qingyun Wang, Yi R. Fung, and Heng Ji. 2025. CALM: Unleashing the Cross-Lingual Self-Aligning Ability of Language Model Question Answering. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 2809–2817, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
CALM: Unleashing the Cross-Lingual Self-Aligning Ability of Language Model Question Answering (Wang et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.findings-naacl.152.pdf