From English to Second Language Mastery: Enhancing LLMs with Cross-Lingual Continued Instruction Tuning

Linjuan Wu, Hao-Ran Wei, Baosong Yang, Weiming Lu


Abstract
Supervised Fine-Tuning (SFT) with translated instruction data effectively adapts Large Language Models (LLMs) from English to non-English languages. We introduce Cross-Lingual Continued Instruction Tuning (X-CIT), which fully leverages translation-based parallel instruction data to enhance cross-lingual adaptability. X-CIT emulates the human process of second language acquisition and is guided by Chomsky’s Principles and Parameters Theory. It first fine-tunes the LLM on English instruction data to establish foundational capabilities (i.e. Principles), then continues with target language translation and customized chat-instruction data to adjust “parameters” specific to the target language. This chat-instruction data captures alignment information in translated parallel data, guiding the model to initially think and respond in its native language before transitioning to the target language. To further mimic human learning progression, we incorporate Self-Paced Learning (SPL) during continued training, allowing the model to advance from simple to complex tasks. Implemented on Llama-2-7B across five languages, X-CIT was evaluated against three objective benchmarks and an LLM-as-a-judge benchmark, improving the strongest baseline by an average of 1.97% and 8.2% in these two benchmarks, respectively.
Anthology ID:
2025.acl-long.1121
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
23006–23023
Language:
URL:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.1121/
DOI:
Bibkey:
Cite (ACL):
Linjuan Wu, Hao-Ran Wei, Baosong Yang, and Weiming Lu. 2025. From English to Second Language Mastery: Enhancing LLMs with Cross-Lingual Continued Instruction Tuning. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 23006–23023, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
From English to Second Language Mastery: Enhancing LLMs with Cross-Lingual Continued Instruction Tuning (Wu et al., ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.1121.pdf