uppdf
bib
Proceedings of the Twelfth Workshop on Asian Translation (WAT 2025)
Toshiaki Nakazawa
|
Isao Goto
pdf
bib
abs
Findings of the First Patent Claims Translation Task at WAT2025
Toshiaki Nakazawa
|
Takashi Tsunakawa
|
Isao Goto
|
Kazuhiro Kasada
|
Katsuhito Sudoh
|
Shoichi Okuyama
|
Takashi Ieda
|
Masaaki Nagata
This paper presents the results and findings of the first shared task of translating patent claims. We provide training, development, and test data for participants and perform human evaluation of the submitted translations. This time, 2 teams submitted their translation results. Our analysis of the human-annotated translation errors revealed not only general, domain-independent errors but also errors specific to patent translation. We also found that the human annotation itself exhibited some serious issues. In this paper, we report on these findings.
pdf
bib
abs
Ehime-U System with Judge and Refinement, Specialized Prompting, and Few-shot for the Patent Claim Translation Task at WAT 2025
Taishi Edamatsu
|
Isao Goto
|
Takashi Ninomiya
The Ehime University team participated inthe Japanese-to-English Patent Claim Trans-lation Task at WAT 2025. We experimentedwith (i) Judge and Refinement, (ii) SpecializedPrompting, and (iii) Few-Shot approaches, us-ing GPT-5 as the underlying LLM. Evaluationbased on the LLM-as-a-Judge framework con-firmed improvements for (i), while (ii) and (iii)showed no significant effects.
pdf
bib
abs
UTSK25 at WAT2025 Patent Claims Translation/Evaluation Task
Haruto Azami
|
Yin Zhang
|
Futo Kajita
|
Nobuyori Nishimura
|
Takehito Utsuro
This paper presents the submission of UTSK25 for the English–Japanese and Japanese–English at the WAT2025 Patent Claims Translation/Evaluation Task. We use a single translation model for both translation directions, built from a large language model through monolingual and bilingual continual pretraining and bilingual supervised fine-tuning. We finally generate translations via prompt engineering to reduce omissions and hallucinations.
pdf
bib
abs
Segmentation Beyond Defaults: Asymmetrical Byte Pair Encoding for Optimal Machine Translation Performance
Saumitra Yadav
|
Manish Shrivastava
Existing Machine Translation (MT) research often suggests a single, fixed set of hyperparameters for word segmentation models, symmetric Byte Pair Encoding (BPE), which applies the same number of merge operations (NMO) to train tokenizers for both source and target languages. However, we demonstrate that this uniform approach doesn’t guarantee optimal MT performance across different language pairs and data sizes. This work investigates BPE segmentation recipes across various data volumes and language pairs to evaluate MT system performance. We find that utilizing asymmetric BPE—where the source and target languages have different NMOs—significantly improves results over the symmetric approach, especially in low-resource settings (50K, 100K, and 500K sentence pairs). Specifically, asymmetric BPE yield statistically significant (p<0.05) average gains of 5.32, 4.46, and 0.7 CHRF++ on English-Hindi in low-resource setups (50K, 100K, and 500K sentence pairs, respectively). We validated this trend across six additional language pairs (English↔Telugu, Shona, Norwegian, Kyrgyz, Hausa, and Inuktitut), observing statistically significant improvement in 10 out of 12 systems compared to symmetric BPE. Our findings indicate a high NMO for the source (4K to 32K) and a low NMO for the target (0.5K to 2K) provides optimal results, particularly benefiting low-resource MT.
pdf
bib
abs
Speech-to-Speech Machine Translation for Dialectal Variations of Hindi
Sanmay Sood
|
Siddharth Rajput
|
Md Shad Akhtar
Hindi has many dialects and they are vital to India’s cultural and linguistics heritage. However, many of them have been largely overlooked in modern language technological advancements, primarily due to lack proper resources. In this study, we explore speech-to-speech machine translation (S2ST) for four Hindi dialects, i.e., Awadhi, Bhojpuri, Braj Bhasha, and Magahi. We adopt a cascaded S2ST pipeline comprising of three stages: Automatic Speech Recognition (ASR), Machine Translation (MT), and Text-to-Speech (TTS). We evaluate many recent large language models (LLMs) for dialect-to-Hindi and dialect-to-English translations in zero-shot, few-shot, and chain-of-thought setups. Our comparative analysis offers insights into the current capabilities and limitations of LLM-based approaches for low-resource dialectal S2ST in Indian context.
pdf
bib
abs
A Systematic Review on Machine Translation and Transliteration Techniques for Code-Mixed Indo-Aryan Languages
H. Rukshan Dias
|
Deshan Sumanathilaka
In multilingual societies, it is common to observe the blending of multiple languages in communication, a phenomenon known as Code-mixing. Globalization and the increasing influence of social media have further amplified multilingualism, resulting in a wider use of code-mixing. This systematic review analyzes existing translation and transliteration techniques for code-mixed Indo-Aryan languages, spanning rule-based and statistical approaches to neural machine translation and transformer-based architectures. It also examines publicly available code-mixed datasets designed for machine translation and transliteration tasks, along with the evaluation metrics commonly introduced and applied in prior studies. Finally, the paper discusses current challenges and limitations, highlighting future research directions for developing more tailored translation pipelines for code-mixed Indo-Aryan languages.
pdf
bib
abs
CycleDistill: Bootstrapping Machine Translation using LLMs with Cyclical Distillation
Deepon Halder
|
Thanmay Jayakumar
|
Raj Dabre
Large language models (LLMs), despite their ability to perform few-shot machine translation (MT), often lag behind dedicated MT systems trained on parallel corpora, which are crucial for high quality machine translation (MT). However, parallel corpora are often scarce or non-existent for low-resource languages. In this paper, we propose CycleDistill, a bootstrapping approach leveraging LLMs and few-shot translation to obtain high-quality MT systems. CycleDistill involves iteratively generating synthetic parallel corpora from monolingual corpora via zero- or few-shot MT, which is then used to fine-tune the model that was used for generating said data for MT. CycleDistill does not need parallel corpora beyond 1 to 4 few-shot examples, and in our experiments focusing on three Indian languages, by relying solely on monolingual corpora, it can achieve high-quality machine translation, improving upon a few-shot baseline model by 20-30 chrF points on average in the first iteration. We also study the effect of leveraging softmax activations during the distillation process and observe mild improvements in translation quality.
pdf
bib
abs
Findings of the WAT 2025 Shared Task on Japanese-English Article-level News Translation
Naoto Shirai
|
Kazutaka Kinugawa
|
Hitoshi Ito
|
Hideya Mino
|
Yoshihiko Kawai
We present the preliminary findings of the WAT 2025 shared task on document-level translation from Japanese to English in the news domain. This task focuses on translating full articles with particular attention to whether translation models can learn to produce expressions and stylistic features typical of English news writing, with the aim to generate outputs that resemble original English news articles. The task consists of three translation styles: (1) literal translation, (2) news-style translation, based on English articles edited to match Japanese content, and (3) finalized translation, the primary goal of this shared task. Only one team participated and submitted a system to a single subtask. All tasks were evaluated automatically, and one task was also evaluated manually to compare the submission with the baseline.
pdf
bib
abs
NHK Submission to WAT 2025: Leveraging Preference Optimization for Article-level Japanese–English News Translation
Hideya Mino
|
Rei Endo
|
Yoshihiko Kawai
This paper describes our submission to the Japanese → English Article-level News Translation Task at WAT 2025. In this task, participants were provided with a small but high-quality parallel corpus along with two intermediate English translations: a literal translation and a style-adapted translation. To effectively exploit these limited training data, our system employs a large language model (LLM) trained via supervised fine-tuning (SFT) followed by Direct Preference Optimization (DPO) that is a preference learning technique for aligning model outputs with professional-quality references. By leveraging literal and style-adapted intermediate translations as negative (rejected) samples and human-edited English articles as positive (chosen) samples in DPO training, we achieved notable improvements in translation quality. We evaluate our approach using BLEU scores and human assessments.
pdf
bib
abs
Findings of WAT2025 English-to-Indic Multimodal Translation Task
Shantipriya Parida
|
Ondřej Bojar
This paper presents the findings of the English-to-Indic Multimodal Translation shared task from the Workshop on Asian Translation (WAT2025). The task featured three tracks: text-only translation, image captioning, and multimodal translation across four low-resource Indic languages: Hindi, Bengali, Malayalam, and Odia. Three teams participated, submitting systems that achieved competitive performance, with BLEU scores ranging from 40.1 to 64.3 across different language pairs and tracks.
pdf
bib
abs
OdiaGenAI participation at WAT 2025
Debasish Dhal
|
Sambit Sekhar
|
Revathy V R
|
Shantipriya Parida
|
Akash Kumar Dhaka
We at ODIAGEN, provide a detailed description of the model, training procedure, results and conclusion of our submission to the Workshop on Asian Translation (WAT 2025). For this year, we focus only on text to text translation tasks on low resource Indic languages targetting Hindi, Bengali, Malayalam and Odia languages specifically. The system uses a large language model NLLB-200 finetuned on large datasets consisting of over 100K rows for each targetted language. The whole training dataset is made of the data provided by the organisers as in previous years and augmented by a much larger 100K sentences of data subsampled from the Samanantar dataset provided by AI4Bharat. From a total of eight evaluation/challenge tests, our approach obtained the highest BLEU scores yet, since the conception on five.
pdf
bib
abs
Does Vision Still Help? Multimodal Translation with CLIP-Based Image Selection
Deepak Kumar
|
Baban Gain
|
Kshetrimayum Boynao Singh
|
Asif Ekbal
Multimodal Machine Translation aims to enhance conventional text-only translation systems by incorporating visual context, typically in the form of images paired with captions. In this work, we present our submission to the WAT 2025 Multimodal Translation Shared Task, which explores the role of visual information in translating English captions into four Indic languages: Hindi, Bengali, Malayalam, and Odia. Our system builds upon the strong multilingual text translation backbone IndicTrans, augmented with a CLIP-based selective visual grounding mechanism. Specifically, we compute cosine similarities between text and image embeddings (both full and cropped regions) and automatically select the most semantically aligned image representation to integrate into the translation model. We observe that overall contribution of visual features is questionable. Our findings reaffirm recent evidence that large multilingual translation models can perform competitively without explicit visual grounding.
pdf
bib
abs
A Picture is Worth a Thousand (Correct) Captions: A Vision-Guided Judge-Corrector System for Multimodal Machine Translation
Siddharth Betala
|
Kushan Raj
|
Vipul Betala
|
Rohan Saswade
In this paper, we describe our system under the team name BLEU Monday for the English-to-Indic Multimodal Translation Task at WAT 2025. We participate in the text-only translation tasks for English-Hindi, English-Bengali, English-Malayalam, and English-Odia language pairs. We present a two-stage approach that addresses quality issues in the training data through automated error detection and correction, followed by parameter-efficient model fine-tuning.Our methodology introduces a vision-augmented judge-corrector pipeline that leverages multimodal language models to systematically identify and correct translation errors in the training data. The judge component classifies translations into three categories: correct, visually ambiguous (requiring image context), or mistranslated (poor translation quality). Identified errors are routed to specialized correctors: GPT-4o-mini regenerates captions requiring visual disambiguation, while IndicTrans2 retranslates cases with pure translation quality issues. This automated pipeline processes 28,928 training examples across four languages, correcting an average of 17.1% of captions per language.We then apply Low-Rank Adaptation (LoRA) to fine-tune the IndicTrans2 en-indic 200M distilled model on both original and corrected datasets. Training on corrected data yields consistent improvements, with BLEU score gains of +1.30 for English-Bengali on the evaluation set (42.00 → 43.30) and +0.70 on the challenge set (44.90 → 45.60), +0.60 for English-Odia on the evaluation set (41.00 → 41.60), and +0.10 for English-Hindi on the challenge set (53.90 → 54.00).