Subaru Kimura


2025

We participated in the constrained English–Japanese track of the WMT 2025 General Machine Translation Task.Our system collected the outputs produced by multiple subsystems, each of which consisted of LLM-based translation and reranking models configured differently (e.g., prompting strategies and context sizes), and reranked those outputs.Each subsystem generated multiple segment-level candidates and iteratively selected the most probable one to construct the document translation.We then reranked the document-level outputs from all subsystems to obtain the final translation.For reranking, we adopted a text-based LLM reranking approach with a reasoning model to take long contexts into account.Additionally, we built a bilingual dictionary on the fly from parallel corpora to make the system more robust to rare words.

2024

We participated in the constrained track for English-Japanese and Japanese-Chinese translations at the WMT 2024 General Machine Translation Task. Our approach was to generate a large number of sentence-level translation candidates and select the most probable translation using minimum Bayes risk (MBR) decoding and document-level large language model (LLM) re-ranking. We first generated hundreds of translation candidates from multiple translation models and retained the top 30 candidates using MBR decoding. In addition, we continually pre-trained LLMs on the target language corpora to leverage document-level information. We utilized LLMs to select the most probable sentence sequentially in context from the beginning of the document.