SYSTRAN @ WMT 2025 General Translation Task
Dakun Zhang, Yara Khater, Ramzi Rahli, Anna Rebollo, Josep Crego
Abstract
We present an English-to-Japanese translationsystem built upon the EuroLLM-9B (Martinset al., 2025) model. The training process involvestwo main stages: continue pretraining(CPT) and supervised fine-tuning (SFT). Afterboth stages, we further tuned the model using adevelopment set to optimize performance. Fortraining data, we employed both basic filteringtechniques and high-quality filtering strategiesto ensure data cleanness. Additionally, we classifyboth the training data and development datainto four different domains and we train andfine-tune with domain specific prompts duringsystem training. Finally, we applied MinimumBayes Risk (MBR) decoding and paragraph-levelreranking for post-processing to enhancetranslation quality.- Anthology ID:
- 2025.wmt-1.35
- Volume:
- Proceedings of the Tenth Conference on Machine Translation
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou, China
- Editors:
- Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
- Venue:
- WMT
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 599–606
- Language:
- URL:
- https://preview.aclanthology.org/ingest-emnlp/2025.wmt-1.35/
- DOI:
- Cite (ACL):
- Dakun Zhang, Yara Khater, Ramzi Rahli, Anna Rebollo, and Josep Crego. 2025. SYSTRAN @ WMT 2025 General Translation Task. In Proceedings of the Tenth Conference on Machine Translation, pages 599–606, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- SYSTRAN @ WMT 2025 General Translation Task (Zhang et al., WMT 2025)
- PDF:
- https://preview.aclanthology.org/ingest-emnlp/2025.wmt-1.35.pdf