Multiway Parallel Corpus in Forced Migration Domain for Multilingual Machine Translation

Fatemeh Azadi, Samuel Larkin, Chi-kiu Lo


Abstract
High-quality domain-specific parallel corpora play a significant role in improving the performance of machine translation (MT) and multilingual natural language processing (NLP) systems in a target domain. However, most existing multilingual parallel corpora focus on general-purpose data, and a majority of highly specialized domains such as forced migration are suffering from lack of multilingual data. In this work, we present a new high-quality 4-way parallel corpus in the forced migration domain. The corpus consists of human-translated journal articles from Forced Migration Review in English, French, Spanish, and Arabic. Our corpus contains data aligned at both document and sentence level in four languages and provides a clean and reliable 4-way parallel resource for multilingual research in forced migration. Using this dataset, we benchmark several open-weight large language models (LLMs), an open-weight multilingual MT system, online closed MT systems, and a closed LLM across 12 translation directions. We further leverage our corpus to improve the MT quality of a top-performing multilingual foundation model with two common domain adaptation approaches, fine-tuning and few-shot prompting. Our results demonstrate the effectiveness of our corpus in improving the translation performance of current models in the forced migration domain.
Anthology ID:
2026.lrec-main.384
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
4889–4901
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.384/
DOI:
Bibkey:
Cite (ACL):
Fatemeh Azadi, Samuel Larkin, and Chi-kiu Lo. 2026. Multiway Parallel Corpus in Forced Migration Domain for Multilingual Machine Translation. International Conference on Language Resources and Evaluation, main:4889–4901.
Cite (Informal):
Multiway Parallel Corpus in Forced Migration Domain for Multilingual Machine Translation (Azadi et al., LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.384.pdf