NeighXLM: Enhancing Cross-Lingual Transfer in Low-Resource Languages via Neighbor-Augmented Contrastive Pretraining

Sicheng Wang, Wenyi Wu, Zibo Zhang


Abstract
Recent progress in multilingual pretraining has yielded strong performance on high-resource languages, albeit with limited generalization to genuinely low-resource settings. While prior approaches have attempted to enhance cross-lingual transfer through representation alignment or contrastive learning, they remain constrained by the extremely limited availability of parallel data to provide positive supervision in target languages. In this work, we introduce NeighXLM, a neighbor-augmented contrastive pretraining framework that enriches target-language supervision by mining semantic neighbors from unlabeled corpora. Without relying on human annotations or translation systems, NeighXLM exploits intra-language semantic relationships captured during pretraining to construct high-quality positive pairs. The approach is model-agnostic and can be seamlessly integrated into existing multilingual pipelines. Experiments on Swahili demonstrate the effectiveness of NeighXLM in improving cross-lingual retrieval and zero-shot transfer performance.
Anthology ID:
2025.findings-emnlp.163
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3019–3030
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.163/
DOI:
10.18653/v1/2025.findings-emnlp.163
Bibkey:
Cite (ACL):
Sicheng Wang, Wenyi Wu, and Zibo Zhang. 2025. NeighXLM: Enhancing Cross-Lingual Transfer in Low-Resource Languages via Neighbor-Augmented Contrastive Pretraining. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 3019–3030, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
NeighXLM: Enhancing Cross-Lingual Transfer in Low-Resource Languages via Neighbor-Augmented Contrastive Pretraining (Wang et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.163.pdf
Checklist:
 2025.findings-emnlp.163.checklist.pdf