Hyuga Koretaka
2025
Targeted Source Text Editing for Machine Translation: Exploiting Quality Estimators and Large Language Models
Hyuga Koretaka
|
Atsushi Fujita
|
Tomoyuki Kajiwara
Proceedings of the Tenth Conference on Machine Translation
To improve the translation quality of “black-box” machine translation (MT) systems,we focus on the automatic editing of source texts to be translated.In addition to the use of a large language model (LLM) to implement robust and accurate editing,we investigate the usefulness of targeted editing, i.e., instructing the LLM with a text span to be edited.Our method determines such source text spans using a span-level quality estimator, which identifies actual translation errors caused by the MT system of interest, and a word aligner, which identifies alignments between the tokens in the source text and translation hypothesis.Our empirical experiments with eight MT systems and ten test datasets for four translation directionsconfirmed the efficacy of our method in improving translation quality.Through analyses, we identified several characteristics of our method andthat the segment-level quality estimator is a vital component of our method.
2023
Mitigating Domain Mismatch in Machine Translation via Paraphrasing
Hyuga Koretaka
|
Tomoyuki Kajiwara
|
Atsushi Fujita
|
Takashi Ninomiya
Proceedings of the 10th Workshop on Asian Translation
Quality of machine translation (MT) deteriorates significantly when translating texts having characteristics that differ from the training data, such as content domain. Although previous studies have focused on adapting MT models on a bilingual parallel corpus in the target domain, this approach is not applicable when no parallel data are available for the target domain or when utilizing black-box MT systems. To mitigate problems caused by such domain mismatch without relying on any corpus in the target domain, this study proposes a method to search for better translations by paraphrasing input texts of MT. To obtain better translations even for input texts from unforeknown domains, we generate their multiple paraphrases, translate each, and rerank the resulting translations to select the most likely one. Experimental results on Japanese-to-English translation reveal that the proposed method improves translation quality in terms of BLEU score for input texts from specific domains.