Futo Kajita


2026

We propose a novel approach to translating Japanese slides into English andto correcting their layout errors by utilizing multimodal LLMs with slide images and XML structures.Existing translation tools often suffer from layout errors after translationdue to text expansion during the translation process, causing text to overlap with figures or other items in slides and thereby reducing readability. To overcome this issue, our proposed framework introduces two steps consisting of (i) translating text fragments within the slide, and (ii) correcting layout errors by optimizing layout placement based on visual consistency. In step (ii), we empirically show that few-shot prompts are quite effective in layout error correction. Given that the optimal combination of steps (i) and (ii) varies depending on the slide layout, our method generates eight different layout candidates. Consequently, we introduce a third step that automatically selects the optimal output from these eight candidates.The experimental results showed that the proposed method outperformed baselines and achieved 4.1% layout error rate and over 80% model selection success rate.

2025

This paper presents the submission of UTSK25 for the English–Japanese and Japanese–English at the WAT2025 Patent Claims Translation/Evaluation Task. We use a single translation model for both translation directions, built from a large language model through monolingual and bilingual continual pretraining and bilingual supervised fine-tuning. We finally generate translations via prompt engineering to reduce omissions and hallucinations.