Simple Test Time Scaling for Machine Translation: Kaze-MT at the WMT25 General Translation Task

Shaomu Tan

Simple Test Time Scaling for Machine Translation: Kaze-MT at the WMT25 General Translation Task

Abstract

This paper describes the Kaze-MT submission to the WMT25 General Machine Translation task (Japanese–Chinese). Our system deliberately adopts a minimalist Test-Time Scaling (TTS) pipeline with three stages—Sampling, Scoring, and Selection—while avoiding any task-specific fine-tuning, in-context exemplars, or bespoke decoding heuristics. In the sampling stage, we use the zero-shot Qwen2.5-72B-Instruct model to generate 512 candidate translations under a fixed temperature schedule designed to encourage lexical and syntactic diversity without sacrificing fluency. In the scoring stage, each candidate is evaluated by multiple reference-free quality estimation (QE) models—KIWI-22, MetricX-24 Hybrid-XXL, and Remedy-24-9B. The selection stage aggregates metric-specific rankings and chooses the candidate with the lowest mean rank, which we found more stable than averaging raw scores across heterogeneous ranges. We submit to both constrained and unconstrained tracks with minimal configuration changes. According to official preliminary results, our submissions are competitive on automatic metrics; in human evaluation, Kaze-MT falls within the 8–13 cluster, delivering performance comparable to CommandA-WMT and DeepSeek-V3 and outperforming other large LLM baselines such as Mistral-Medium and other extensively tuned MT systems.

Anthology ID:: 2025.wmt-1.40
Volume:: Proceedings of the Tenth Conference on Machine Translation
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
Venue:: WMT
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 651–656
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.wmt-1.40/
DOI:
Bibkey:
Cite (ACL):: Shaomu Tan. 2025. Simple Test Time Scaling for Machine Translation: Kaze-MT at the WMT25 General Translation Task. In Proceedings of the Tenth Conference on Machine Translation, pages 651–656, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Simple Test Time Scaling for Machine Translation: Kaze-MT at the WMT25 General Translation Task (Tan, WMT 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.wmt-1.40.pdf

PDF Cite Search Fix data