Sen Peng

2025

pdf bib abs
Improve Fluency Of Neural Machine Translation Using Large Language Models
Jianfei He | Wenbo Pan | Jijia Yang | Sen Peng | Xiaohua Jia
Proceedings of Machine Translation Summit XX: Volume 1

Large language models (LLMs) demonstrate significant capabilities in many natural language processing. However, their performance in machine translation is still behind the models that are specially trained for machine translation with an encoder-decoder architecture. This paper investigates how to improve neural machine translation (NMT) with LLMs. Our proposal is based on an empirical insight that NMT gets worse fluency than human translation. We propose to use LLMs to enhance the fluency of NMT’s generation by integrating a language model at the target side. we use contrastive learning to constrain fluency so that it does not exceed the LLMs. Our experiments on three language pairs show that this method can improve the performance of NMT. Our empirical analysis further demonstrates that this method improves the fluency at the target side. Our experiments also show that some straightforward post-processing methods using LLMs, such as re-ranking and refinement, are not effective.

2024

There exists a discrepancy between the token-level objective during training and the overall sequence-level quality that is expected from the model. This discrepancy leads to issues like exposure bias.To align the model with human expectations, sequence-level objectives are often used to fine-tune pre-trained models.In this paper, we introduce a contrastive preference model that enhances the traditional Plackett-Luce model by incorporating an indicator function. Building upon this novel preference model, we propose Contrastive Preference Learning (CPL), which uses offline samples with list-wise preferences to fine-tune a pre-trained model in Neural Machine Translation. Our experiments, conducted on three language pairs, demonstrate that CPL outperforms not only the vanilla Transformer model but also other token-level and sequence-level baselines. Furthermore, the ablation study highlights the essential role of the proposed indicator function in achieving this improvement.

Co-authors

Jie Xu 1

Jijia Yang 1

Venues

findings1
mtsummit1

Fix author