Zibo Zhang
2025
NeighXLM: Enhancing Cross-Lingual Transfer in Low-Resource Languages via Neighbor-Augmented Contrastive Pretraining
Sicheng Wang
|
Wenyi Wu
|
Zibo Zhang
Findings of the Association for Computational Linguistics: EMNLP 2025
Recent progress in multilingual pretraining has yielded strong performance on high-resource languages, albeit with limited generalization to genuinely low-resource settings. While prior approaches have attempted to enhance cross-lingual transfer through representation alignment or contrastive learning, they remain constrained by the extremely limited availability of parallel data to provide positive supervision in target languages. In this work, we introduce NeighXLM, a neighbor-augmented contrastive pretraining framework that enriches target-language supervision by mining semantic neighbors from unlabeled corpora. Without relying on human annotations or translation systems, NeighXLM exploits intra-language semantic relationships captured during pretraining to construct high-quality positive pairs. The approach is model-agnostic and can be seamlessly integrated into existing multilingual pipelines. Experiments on Swahili demonstrate the effectiveness of NeighXLM in improving cross-lingual retrieval and zero-shot transfer performance.